1. Data Organization and Explortation

Back to Outline

1.1: Importing Libraries and Setting Preferences

Back to Outline

In [253]:
# Importing all important libraries here!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from datetime import datetime
from dateutil.parser import parse
from fbprophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error
In [254]:
# Setting notebook preferences
pd.set_option('precision', 2)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 500)
sns.set(style = 'whitegrid', font_scale = 1.25)

For this analysis we are going to use our final dataset as it has all of the information on weather and temperature that we need.

In [255]:
# Importing our final merged dataset to use for modeling
df = pd.read_csv('complete.csv')
In [256]:
df.head()
Out[256]:
Unnamed: 0 date_only month mnth_yr day1_of_the_week time_of_occurrence day_of_the_year temp_in_F humidity percip_inches 911_call_type type_of_incident division sector council_district victim_type victim_race victim_ethnicity victim_gender victim_age responding_officer_#1__badge_no responding_officer_#1__name responding_officer_#2_badge_no responding_officer_#2__name x_coordinate y_cordinate zip_code total_pop male %_male female %_female median_age 18_&_over 21_&_over 62_&_over 65_&_over %_white %_black %_native %_asian %_hispanic pop_over_16 %_pop_over_16 %_employed %_unemployed mean_household_income %_families_poverty %_all_people_poverty year zip_code.1
0 0 2015-01-01 1 January-2015 Thu 17:00 1 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV SOUTH CENTRAL 730.0 D8 Individual White Non-Hispanic or Latino Male 50 8173 JONES,REGINALD,LADUNNE NaN NaN 2.51e+06 6.93e+06 75241 50872.0 23984.0 45.6 26888.0 54.4 34.0 71.8 68.1 14.1 12.0 21.1 74.7 11.5 0.0 20.3 3777.1 3777.1 48.9 6.3 42029.6 27.8 30.6 2015 75241
1 1 2015-01-01 1 January-2015 Thu 00:20 1 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTH CENTRAL 710.0 D4 Individual Black Non-Hispanic or Latino Male 51 8133 ADAMS,CORY,JAMES NaN NaN 2.49e+06 6.95e+06 75216 76015.0 35922.0 46.6 40093.0 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 75216
2 2 2015-01-01 1 January-2015 Thu 08:00 1 34.35 89.47 0.03 58 - ROUTINE INVESTIGATION FRAUD USE/POSS IDENTIFYING INFO-PRELIMINARY IN... CENTRAL 150.0 D2 Individual Black Non-Hispanic or Latino Female 64 7341 FREEMAN,DIANA,J NaN NaN 2.49e+06 6.97e+06 75215 22570.0 11298.0 48.8 11272.0 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 75215
3 3 2015-01-01 1 January-2015 Thu 02:00 1 34.35 89.47 0.03 40 - OTHER ASSAULT -VERBAL THREAT NORTHWEST 520.0 D6 Individual Hispanic or Latino Hispanic or Latino Male 36 10767 HOVIS,ALAN 5455 AKON,FREDRICK,CHARLES 2.47e+06 7.00e+06 75220 69009.0 38379.0 53.0 30630.0 47.0 33.7 74.9 71.2 12.0 9.1 75.7 7.3 16.0 3.0 48.1 4477.7 4477.7 62.7 4.8 104542.2 17.9 19.8 2015 75220
4 4 2015-01-01 1 January-2015 Thu 13:00 1 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 NORTHEAST 220.0 D9 Individual Black Non-Hispanic or Latino Female 70 9654 BANDAS,WAYI,ALIBEY NaN NaN 2.53e+06 6.99e+06 75228 106467.0 52189.0 49.1 54278.0 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 75228
In [257]:
# Looking for duplicates, just in case
df['Unnamed: 0'].nunique()
Out[257]:
254724
In [258]:
len(df)
Out[258]:
254724
In [259]:
# Getting rid of our unnecessary columns
df.drop(columns = 'Unnamed: 0', inplace = True)

Now that we have our data loaded let's determine the main goal of our time series analysis and refine our data for modeling!

The main research question/hypothesis for our time series analysis is: how, if at all, does daily temperature affect daily crime reports in Dallas, Texas?

Since our main focus is on time-based features it is super important thaat we have all of our data set up correctly. Let's start working on this below!

In [260]:
df['date_only'] = pd.to_datetime(df['date_only'])
In [261]:
# Creating new columns for the datetime versions of these
df['mnth'] = df['date_only'].dt.month
df['day_of_year_number'] = df['date_only'].dt.dayofyear
In [262]:
# Sanity check: does everything look like we expect it to?
df.head()
Out[262]:
date_only month mnth_yr day1_of_the_week time_of_occurrence day_of_the_year temp_in_F humidity percip_inches 911_call_type type_of_incident division sector council_district victim_type victim_race victim_ethnicity victim_gender victim_age responding_officer_#1__badge_no responding_officer_#1__name responding_officer_#2_badge_no responding_officer_#2__name x_coordinate y_cordinate zip_code total_pop male %_male female %_female median_age 18_&_over 21_&_over 62_&_over 65_&_over %_white %_black %_native %_asian %_hispanic pop_over_16 %_pop_over_16 %_employed %_unemployed mean_household_income %_families_poverty %_all_people_poverty year zip_code.1 mnth day_of_year_number
0 2015-01-01 1 January-2015 Thu 17:00 1 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV SOUTH CENTRAL 730.0 D8 Individual White Non-Hispanic or Latino Male 50 8173 JONES,REGINALD,LADUNNE NaN NaN 2.51e+06 6.93e+06 75241 50872.0 23984.0 45.6 26888.0 54.4 34.0 71.8 68.1 14.1 12.0 21.1 74.7 11.5 0.0 20.3 3777.1 3777.1 48.9 6.3 42029.6 27.8 30.6 2015 75241 1 1
1 2015-01-01 1 January-2015 Thu 00:20 1 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTH CENTRAL 710.0 D4 Individual Black Non-Hispanic or Latino Male 51 8133 ADAMS,CORY,JAMES NaN NaN 2.49e+06 6.95e+06 75216 76015.0 35922.0 46.6 40093.0 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 75216 1 1
2 2015-01-01 1 January-2015 Thu 08:00 1 34.35 89.47 0.03 58 - ROUTINE INVESTIGATION FRAUD USE/POSS IDENTIFYING INFO-PRELIMINARY IN... CENTRAL 150.0 D2 Individual Black Non-Hispanic or Latino Female 64 7341 FREEMAN,DIANA,J NaN NaN 2.49e+06 6.97e+06 75215 22570.0 11298.0 48.8 11272.0 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 75215 1 1
3 2015-01-01 1 January-2015 Thu 02:00 1 34.35 89.47 0.03 40 - OTHER ASSAULT -VERBAL THREAT NORTHWEST 520.0 D6 Individual Hispanic or Latino Hispanic or Latino Male 36 10767 HOVIS,ALAN 5455 AKON,FREDRICK,CHARLES 2.47e+06 7.00e+06 75220 69009.0 38379.0 53.0 30630.0 47.0 33.7 74.9 71.2 12.0 9.1 75.7 7.3 16.0 3.0 48.1 4477.7 4477.7 62.7 4.8 104542.2 17.9 19.8 2015 75220 1 1
4 2015-01-01 1 January-2015 Thu 13:00 1 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 NORTHEAST 220.0 D9 Individual Black Non-Hispanic or Latino Female 70 9654 BANDAS,WAYI,ALIBEY NaN NaN 2.53e+06 6.99e+06 75228 106467.0 52189.0 49.1 54278.0 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 75228 1 1
In [263]:
# Getting rid of our old columns
df.drop(columns = ['month', 'day_of_the_year'], inplace = True)
In [264]:
# Just checking our overall info for the df again
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 254724 entries, 0 to 254723
Data columns (total 50 columns):
date_only                          254724 non-null datetime64[ns]
mnth_yr                            254724 non-null object
day1_of_the_week                   254724 non-null object
time_of_occurrence                 254724 non-null object
temp_in_F                          254724 non-null float64
humidity                           254724 non-null float64
percip_inches                      254724 non-null float64
911_call_type                      254724 non-null object
type_of_incident                   254724 non-null object
division                           254713 non-null object
sector                             254713 non-null float64
council_district                   254578 non-null object
victim_type                        254724 non-null object
victim_race                        254724 non-null object
victim_ethnicity                   254724 non-null object
victim_gender                      254724 non-null object
victim_age                         254724 non-null int64
responding_officer_#1__badge_no    254724 non-null object
responding_officer_#1__name        254724 non-null object
responding_officer_#2_badge_no     85131 non-null object
responding_officer_#2__name        85131 non-null object
x_coordinate                       254724 non-null float64
y_cordinate                        254724 non-null float64
zip_code                           254724 non-null int64
total_pop                          254724 non-null float64
male                               254724 non-null float64
%_male                             254724 non-null float64
female                             254724 non-null float64
%_female                           254724 non-null float64
median_age                         254724 non-null float64
18_&_over                          254724 non-null float64
21_&_over                          254724 non-null float64
62_&_over                          254724 non-null float64
65_&_over                          254724 non-null float64
%_white                            254724 non-null float64
%_black                            254724 non-null float64
%_native                           254724 non-null float64
%_asian                            254724 non-null float64
%_hispanic                         254724 non-null float64
pop_over_16                        254724 non-null float64
%_pop_over_16                      254724 non-null float64
%_employed                         254724 non-null float64
%_unemployed                       254724 non-null float64
mean_household_income              254724 non-null float64
%_families_poverty                 254724 non-null float64
%_all_people_poverty               254724 non-null float64
year                               254724 non-null int64
zip_code.1                         254724 non-null int64
mnth                               254724 non-null int64
day_of_year_number                 254724 non-null int64
dtypes: datetime64[ns](1), float64(28), int64(6), object(15)
memory usage: 97.2+ MB

1.2: Taking a Closer Look at Data Distribution and Correlations

Back to Outline

Before we can even begin to start thinking of modeling we have a few main issues to deal with, such as null values, distribution of specific features, and looking closely at how our features correlate (if at all) to our target variable.

In [265]:
# Where my nulls at?
df.isnull().sum().sort_values(ascending = False)
Out[265]:
responding_officer_#2__name        169593
responding_officer_#2_badge_no     169593
council_district                      146
division                               11
sector                                 11
victim_race                             0
y_cordinate                             0
x_coordinate                            0
responding_officer_#1__name             0
responding_officer_#1__badge_no         0
victim_age                              0
victim_gender                           0
victim_ethnicity                        0
day_of_year_number                      0
victim_type                             0
mnth                                    0
type_of_incident                        0
911_call_type                           0
percip_inches                           0
humidity                                0
temp_in_F                               0
time_of_occurrence                      0
day1_of_the_week                        0
mnth_yr                                 0
zip_code                                0
total_pop                               0
male                                    0
%_male                                  0
zip_code.1                              0
year                                    0
%_all_people_poverty                    0
%_families_poverty                      0
mean_household_income                   0
%_unemployed                            0
%_employed                              0
%_pop_over_16                           0
pop_over_16                             0
%_hispanic                              0
%_asian                                 0
%_native                                0
%_black                                 0
%_white                                 0
65_&_over                               0
62_&_over                               0
21_&_over                               0
18_&_over                               0
median_age                              0
%_female                                0
female                                  0
date_only                               0
dtype: int64
In [266]:
# Dropping nulls so we can model
df.dropna(inplace = True)
In [267]:
# Taking a quick look to see how much the length of our data changed by dropping nulls.
len(df)
Out[267]:
85066
In [268]:
# Dropping count features where we have a %
df.drop(columns = ['male', 'female', 'pop_over_16', 'zip_code.1'], inplace = True)
In [269]:
# Creating a daily count df to see if I can add it as a feature
daily_count = pd.DataFrame(df['date_only'].value_counts())
In [270]:
# What does our daily_count look like?
daily_count.sort_values(by = 'date_only', inplace = True)
daily_count.head(500)
Out[270]:
date_only
2018-02-13 1
2018-02-12 3
2018-02-14 3
2018-02-06 4
2018-02-11 15
2018-02-05 18
2015-02-25 27
2017-01-26 27
2017-02-09 29
2015-02-23 30
2017-09-12 30
2018-01-31 31
2015-02-03 31
2018-02-20 32
2017-12-26 32
2017-11-08 32
2015-12-16 33
2018-01-16 33
2015-12-29 33
2018-01-02 33
2017-09-13 33
2018-04-10 33
2017-12-12 34
2017-02-22 34
2018-10-16 35
2016-05-05 35
2018-02-15 35
2015-04-21 35
2017-09-20 35
2018-08-21 36
2017-12-18 36
2017-12-14 36
2016-02-04 36
2017-09-26 37
2016-03-31 37
2015-01-07 37
2018-11-01 37
2018-03-14 37
2016-02-23 37
2018-04-03 38
2017-04-25 38
2015-04-08 38
2015-12-21 38
2017-05-23 38
2015-03-09 38
2018-05-15 38
2017-02-28 38
2015-01-05 38
2017-08-15 38
2015-03-10 38
2015-06-16 38
2018-02-27 39
2018-03-20 39
2017-09-23 39
2017-09-27 39
2016-01-19 39
2016-01-11 39
2018-06-28 39
2017-09-21 39
2015-03-13 39
2018-03-29 39
2015-05-19 39
2017-10-23 40
2017-12-25 40
2018-05-16 40
2015-12-10 40
2015-01-13 40
2017-10-18 40
2018-02-28 40
2017-12-13 40
2015-02-24 40
2017-12-27 40
2017-12-28 40
2016-03-07 40
2015-08-12 40
2017-10-26 40
2018-02-23 40
2015-02-10 40
2017-11-21 40
2015-05-21 40
2015-03-11 40
2018-03-28 41
2017-04-26 41
2017-03-22 41
2015-02-11 41
2017-09-18 41
2018-03-22 41
2017-08-04 41
2016-02-09 41
2017-02-14 41
2018-02-22 41
2018-06-13 41
2018-04-04 41
2018-02-21 41
2016-11-22 41
2016-01-25 41
2017-03-05 41
2016-02-10 41
2016-07-06 41
2016-01-06 41
2015-04-16 41
2017-09-07 41
2017-05-25 41
2017-05-02 41
2016-07-27 41
2016-01-27 41
2016-09-27 42
2016-03-21 42
2015-03-16 42
2016-08-22 42
2015-02-04 42
2017-12-06 42
2015-02-27 42
2015-02-17 42
2017-12-20 42
2016-02-11 42
2015-09-15 42
2016-01-16 42
2015-11-28 42
2015-11-30 42
2018-04-17 42
2016-03-20 42
2016-01-22 42
2017-11-29 42
2016-05-15 42
2018-01-23 42
2017-12-19 42
2016-11-15 42
2017-10-31 42
2018-04-16 42
2017-09-19 42
2016-03-30 42
2018-06-12 42
2018-04-08 42
2017-01-20 43
2016-10-11 43
2017-03-03 43
2017-11-20 43
2018-01-15 43
2017-03-08 43
2015-10-13 43
2016-04-17 43
2015-01-20 43
2017-02-23 43
2016-10-19 43
2018-01-04 43
2017-01-18 43
2015-04-14 43
2017-09-10 43
2015-01-06 43
2016-02-08 43
2015-02-18 43
2017-12-15 44
2016-02-16 44
2017-08-24 44
2018-04-02 44
2018-02-10 44
2016-05-12 44
2015-07-07 44
2018-05-10 44
2016-10-25 44
2016-10-06 44
2018-04-19 44
2015-01-21 44
2017-05-10 44
2015-03-05 44
2016-09-29 44
2015-06-10 44
2017-07-11 44
2017-04-06 44
2017-09-14 44
2017-11-09 44
2017-04-05 44
2018-08-14 44
2016-11-07 44
2017-07-12 44
2018-01-03 45
2017-12-05 45
2018-01-09 45
2015-03-04 45
2016-01-26 45
2015-04-17 45
2018-01-24 45
2015-11-03 45
2017-03-02 45
2017-10-04 45
2015-12-30 45
2016-12-08 45
2015-02-26 45
2017-04-03 45
2015-02-22 45
2018-04-07 45
2015-12-17 45
2015-02-28 45
2015-05-06 45
2018-06-26 45
2017-11-16 45
2015-12-28 45
2016-03-25 45
2018-02-26 45
2017-07-18 45
2017-07-17 45
2016-01-18 45
2015-11-19 45
2017-02-01 46
2018-03-21 46
2018-03-27 46
2016-11-23 46
2017-10-10 46
2018-02-07 46
2016-03-15 46
2016-05-11 46
2017-09-06 46
2017-05-26 46
2016-06-04 46
2017-11-13 46
2016-02-15 46
2017-06-21 46
2015-03-26 46
2017-05-09 46
2017-12-10 46
2017-02-08 46
2018-11-13 46
2015-12-14 46
2018-03-19 46
2018-03-13 46
2016-10-28 46
2016-07-16 46
2018-10-26 46
2015-09-09 46
2017-08-28 46
2018-05-02 46
2018-09-19 46
2016-12-15 46
2018-01-14 47
2015-12-27 47
2016-01-24 47
2016-03-26 47
2017-03-01 47
2017-02-15 47
2015-03-02 47
2017-11-14 47
2017-08-01 47
2017-01-25 47
2016-03-01 47
2016-08-23 47
2018-08-08 47
2015-08-07 47
2015-01-04 47
2017-04-17 47
2016-03-09 47
2017-05-30 47
2017-05-16 47
2017-02-16 47
2015-11-16 47
2016-12-20 47
2015-01-14 47
2015-06-22 47
2018-02-08 47
2017-03-23 47
2016-11-14 47
2017-05-11 47
2016-12-21 47
2018-02-16 47
2018-11-20 47
2017-08-10 47
2017-07-24 47
2015-09-03 47
2017-04-20 47
2017-04-13 47
2016-01-21 47
2016-08-16 47
2018-10-31 47
2017-08-09 47
2018-10-02 47
2015-07-09 48
2017-02-19 48
2018-06-20 48
2016-04-06 48
2016-06-13 48
2018-03-05 48
2016-02-25 48
2018-03-30 48
2015-01-19 48
2017-10-24 48
2015-05-20 48
2017-03-09 48
2017-02-20 48
2015-05-27 48
2016-03-04 48
2016-01-04 48
2016-04-18 48
2017-05-21 48
2016-11-08 48
2017-09-22 48
2016-03-22 48
2017-08-11 48
2015-07-30 48
2017-09-29 48
2016-12-03 48
2015-03-03 48
2015-12-08 48
2015-05-10 48
2015-03-17 48
2017-07-07 48
2018-05-30 48
2016-01-13 48
2017-10-17 48
2015-11-11 48
2015-12-01 48
2017-08-22 48
2017-08-23 48
2017-10-15 48
2017-08-03 48
2018-04-26 48
2015-12-24 48
2016-10-27 48
2016-06-01 48
2016-08-24 48
2015-07-15 49
2017-12-09 49
2017-07-19 49
2015-01-29 49
2015-04-28 49
2017-11-02 49
2016-03-24 49
2018-07-26 49
2015-03-22 49
2015-03-27 49
2017-06-07 49
2018-01-12 49
2017-06-14 49
2017-02-13 49
2016-03-11 49
2016-02-01 49
2017-08-02 49
2018-01-07 49
2015-01-28 49
2017-03-14 49
2015-02-13 49
2016-08-11 49
2017-03-04 49
2015-05-29 49
2017-10-28 49
2015-04-23 49
2015-11-25 49
2016-01-03 49
2016-12-13 49
2015-02-19 49
2016-09-26 49
2015-11-12 49
2018-06-18 49
2017-06-29 49
2015-11-24 49
2018-09-25 49
2015-03-20 49
2018-02-25 49
2016-09-28 49
2015-11-26 49
2016-03-08 49
2016-01-10 49
2015-06-29 49
2017-12-07 49
2016-03-28 49
2016-04-04 49
2017-02-18 50
2017-11-06 50
2016-02-02 50
2015-04-09 50
2016-02-17 50
2017-10-11 50
2018-05-23 50
2016-08-15 50
2018-10-15 50
2016-07-14 50
2015-02-16 50
2016-09-14 50
2016-08-25 50
2016-12-17 50
2016-03-17 50
2018-08-13 50
2017-11-28 50
2018-08-22 50
2018-06-11 50
2018-11-05 50
2018-11-27 50
2015-04-07 50
2017-02-24 50
2016-04-27 50
2018-02-03 50
2015-05-13 50
2017-01-11 50
2018-04-13 50
2017-10-30 50
2017-09-11 50
2016-03-23 50
2016-05-02 50
2015-10-12 50
2015-03-24 51
2015-10-22 51
2017-07-05 51
2017-12-08 51
2018-12-11 51
2016-04-13 51
2017-03-30 51
2016-05-27 51
2017-03-15 51
2016-02-12 51
2018-12-24 51
2018-05-07 51
2015-07-06 51
2018-05-22 51
2015-11-13 51
2016-10-31 51
2017-01-06 51
2016-05-31 51
2015-02-02 51
2018-12-14 51
2018-07-30 51
2015-09-28 51
2016-11-01 51
2015-05-05 51
2016-10-20 51
2015-10-07 51
2017-08-12 51
2018-01-18 51
2018-12-12 51
2017-05-05 51
2015-04-29 51
2015-12-07 51
2017-02-21 51
2018-03-01 51
2017-10-02 51
2016-11-09 51
2016-01-23 51
2015-09-13 51
2018-03-15 51
2016-01-14 51
2017-04-21 51
2016-05-10 51
2018-01-25 51
2018-04-11 51
2018-04-18 51
2016-05-09 51
2015-05-23 51
2018-05-29 51
2018-05-11 51
2017-04-18 51
2018-11-06 52
2018-02-19 52
2018-04-05 52
2015-05-11 52
2018-01-13 52
2016-05-16 52
2018-09-13 52
2018-06-14 52
2015-12-04 52
2018-02-01 52
2016-07-26 52
2016-04-19 52
2015-09-25 52
2017-01-05 52
2017-02-26 52
2015-09-01 52
2018-05-13 52
2015-09-29 52
2015-11-20 52
2016-12-05 52
2015-12-09 52
2017-09-25 52
2015-10-06 52
2016-10-26 52
2018-11-09 52
2016-05-25 52
2016-05-13 52
2015-07-22 52
2016-08-03 52
2017-07-25 52
2015-04-13 52
2018-04-25 52
2018-10-24 52
2018-06-27 52
2015-05-12 52
2017-10-03 52
2015-01-02 52
2018-05-24 52
2016-08-17 52
2016-12-11 52
2015-08-25 52
2015-03-28 52
2016-04-14 52
2016-06-16 52
2017-09-08 52
2018-06-24 52
2016-09-06 52
2016-03-16 53
2015-09-18 53
2015-02-15 53
2017-05-08 53
2017-06-13 53
In [271]:
# Cleaning up of new df
daily_count.reset_index(inplace = True)
daily_count.rename(columns = {'index':'date_only', 'date_only':'daily_crime_count'}, inplace = True)
daily_count.sort_values(by = 'date_only', inplace = True)
daily_count.head()
Out[271]:
date_only daily_crime_count
1402 2015-01-01 81
484 2015-01-02 52
1128 2015-01-03 67
248 2015-01-04 47
47 2015-01-05 38
In [272]:
# Combining dataframes
df = pd.merge(df, daily_count, how = 'left', on= 'date_only' )
In [273]:
df.head(100)
Out[273]:
date_only mnth_yr day1_of_the_week time_of_occurrence temp_in_F humidity percip_inches 911_call_type type_of_incident division sector council_district victim_type victim_race victim_ethnicity victim_gender victim_age responding_officer_#1__badge_no responding_officer_#1__name responding_officer_#2_badge_no responding_officer_#2__name x_coordinate y_cordinate zip_code total_pop %_male %_female median_age 18_&_over 21_&_over 62_&_over 65_&_over %_white %_black %_native %_asian %_hispanic %_pop_over_16 %_employed %_unemployed mean_household_income %_families_poverty %_all_people_poverty year mnth day_of_year_number daily_crime_count
0 2015-01-01 January-2015 Thu 02:00 34.35 89.47 0.03 40 - OTHER ASSAULT -VERBAL THREAT NORTHWEST 520.0 D6 Individual Hispanic or Latino Hispanic or Latino Male 36 10767 HOVIS,ALAN 5455 AKON,FREDRICK,CHARLES 2.47e+06 7.00e+06 75220 69009.0 53.0 47.0 33.7 74.9 71.2 12.0 9.1 75.7 7.3 16.0 3.0 48.1 4477.7 62.7 4.8 104542.2 17.9 19.8 2015 1 1 81
1 2015-01-01 January-2015 Thu 00:40 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTHWEST 440.0 D1 Individual Hispanic or Latino Hispanic or Latino Female 22 10486 CASAS-TAVERA,J 10003 GRAY,ROY,CALVERT 2.47e+06 6.95e+06 75211 89910.0 51.0 49.0 28.8 68.4 62.9 8.3 6.5 76.2 10.5 22.9 1.1 78.0 4008.4 60.7 6.6 48504.9 22.7 26.1 2015 1 1 81
2 2015-01-01 January-2015 Thu 03:15 34.35 89.47 0.03 41/11V - BMV-IN PROGRESS BMV CENTRAL 110.0 D2 Individual Hispanic or Latino Hispanic or Latino Male 45 10665 NEEDHAM,CORY 10188 KUNZLER,JEFFERY,L 2.50e+06 6.98e+06 75246 8271.0 53.9 46.1 31.0 84.9 80.8 9.8 7.5 62.5 20.8 1.7 4.7 35.6 2340.0 70.2 4.9 58389.0 23.6 31.8 2015 1 1 81
3 2015-01-01 January-2015 Thu 02:43 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) THREATENING PHONE CALLS NORTH CENTRAL 620.0 D12 Individual Hispanic or Latino Hispanic or Latino Female 41 10698 BEAUDREAULT,ZACHARY 7543 AVILA II,HUMBERTO,JAVIER 2.49e+06 7.04e+06 75248 42333.0 49.7 50.3 42.4 82.8 79.6 20.3 15.8 70.3 7.8 2.7 5.5 17.3 3580.1 67.6 4.2 114881.6 5.2 7.5 2015 1 1 81
4 2015-01-01 January-2015 Thu 01:50 34.35 89.47 0.03 DASF-DIST ACTIVE SHOOTER FOOT ASSAULT -OFFENSIVE CONTACT NORTHEAST 230.0 D9 Individual Hispanic or Latino Hispanic or Latino Male 29 9568 DETAMBLE,STEPHANIE,JULIANNA 9938 MADALINSKI,BRIAN,WILLIAM 2.53e+06 7.00e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 1 81
5 2015-01-01 January-2015 Thu 22:30 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV SOUTH CENTRAL 740.0 D3 Individual Black Non-Hispanic or Latino Male 27 10070 LARY,WENDY,LEE 10645 SWANSON,LAURENT,RASHAD 2.48e+06 6.94e+06 75232 50429.0 45.1 54.9 35.8 73.8 68.5 17.2 13.4 23.6 72.3 6.4 0.1 23.4 3827.5 50.4 6.3 42833.4 26.6 29.4 2015 1 1 81
6 2015-01-01 January-2015 Thu 02:00 34.35 89.47 0.03 16 - INJURED PERSON ASSAULT -OFFENSIVE CONTACT CENTRAL 120.0 D14 Individual White Non-Hispanic or Latino Female 22 8370 BANKSTON,LARRY,J 8405 BERNIL,JUSTIN,H 2.49e+06 6.98e+06 75204 43764.0 52.5 47.5 32.5 90.4 88.2 10.6 8.3 72.4 10.9 9.1 5.5 23.7 3284.3 76.4 3.5 93590.8 15.9 19.7 2015 1 1 81
7 2015-01-01 January-2015 Thu 02:30 34.35 89.47 0.03 58 - ROUTINE INVESTIGATION ROBBERY OF INDIVIDUAL (AGG) SOUTHWEST 450.0 D3 Individual Hispanic or Latino Hispanic or Latino Female 34 7466 KISNER,NATHAN,EDWARD 10705 LEAL,STEPHANIE 2.47e+06 6.94e+06 75233 21234.0 47.1 52.9 30.6 68.5 64.2 11.9 8.7 49.5 37.2 31.0 0.0 53.1 3794.2 56.5 9.1 44116.0 24.6 27.8 2015 1 1 81
8 2015-01-01 January-2015 Thu 03:00 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) THEFT OF SERVICE > OR EQUAL $20 BUT <$500 NORTH CENTRAL 650.0 D11 Individual Black Non-Hispanic or Latino Male 44 9408 BATEMON,LYNELL,EARL 7750 WILLIAMS,CHRISTOPHER,B 2.50e+06 7.01e+06 75230 39586.0 46.9 53.1 47.9 81.0 79.2 28.4 23.5 88.0 4.5 4.4 3.7 10.2 2721.4 60.5 2.5 183454.5 4.8 6.3 2015 1 1 81
9 2015-01-01 January-2015 Thu 03:00 34.35 89.47 0.03 16 - INJURED PERSON ASSAULT -BODILY INJURY ONLY NORTHWEST 520.0 D2 Individual White Non-Hispanic or Latino Male 35 7741 PARKER,COREY 7389 CLIFFORD,MICHAEL,A 2.47e+06 6.99e+06 75247 11185.0 66.3 33.7 33.5 94.3 89.5 6.0 4.9 46.2 43.6 106.0 0.7 18.1 10621.0 22.4 2.6 78131.0 24.8 28.2 2015 1 1 81
10 2015-01-01 January-2015 Thu 04:00 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) ROBBERY OF INDIVIDUAL NORTHEAST 220.0 D9 Individual Black Non-Hispanic or Latino Male 36 10651 EBENSBERGER,RILEY,TALMADG 8930 CLARK,MATTHEW,RYAN 2.54e+06 7.00e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 1 81
11 2015-01-01 January-2015 Thu 02:11 34.35 89.47 0.03 40/01 - OTHER INJURED PERSON- PUBLIC PROPERTY (OTHER THAN FI... CENTRAL 140.0 D14 Individual White Non-Hispanic or Latino Female 34 10785 FULTON,GORDON 6093 BURCH,GARY,ROBERT 2.50e+06 6.98e+06 75206 53930.0 52.0 48.0 31.2 86.3 81.0 8.7 6.7 78.6 7.1 8.2 5.4 24.7 2753.3 73.7 3.5 91725.1 11.8 17.4 2015 1 1 81
12 2015-01-01 January-2015 Thu 17:00 34.35 89.47 0.03 6XA - MAJOR DIST AMBULANCE ASSAULT -OFFENSIVE CONTACT SOUTHWEST 410.0 D1 Individual Black Non-Hispanic or Latino Female 18 10030 HERNANDEZ,JORGE,ANTONIO 9794 BIRD,MARC,A 2.49e+06 6.96e+06 75203 39159.0 50.3 49.7 32.1 69.6 65.9 12.3 9.8 58.2 33.6 3.9 0.5 57.9 2619.9 53.2 7.8 43295.1 33.6 37.9 2015 1 1 81
13 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTHWEST 410.0 D1 Individual White Non-Hispanic or Latino Female 79 10756 STERLE,PHILIP 8315 GEISSLER,KEITH,THOMAS 2.48e+06 6.96e+06 75208 41872.0 51.6 48.4 34.2 73.5 69.8 12.3 9.6 83.9 4.9 11.2 0.5 70.4 2857.6 61.1 5.5 66624.8 17.7 22.6 2015 1 1 81
14 2015-01-01 January-2015 Thu 04:01 34.35 89.47 0.03 14 - STABBING, CUTTING INJURED PERSON- PUBLIC PROPERTY (OTHER THAN FI... NORTH CENTRAL 640.0 D11 Individual Black Non-Hispanic or Latino Male 20 10640 AKIN,THERON,NATHANIEL 10020 WILLIS,RYAN,DESHAWN 2.50e+06 7.03e+06 75240 47257.0 51.0 49.0 34.1 74.6 71.9 14.8 12.6 53.0 10.2 3.6 4.6 41.8 3637.3 68.7 4.6 84097.6 22.7 23.0 2015 1 1 81
15 2015-01-01 January-2015 Thu 05:00 34.35 89.47 0.03 40/01 - OTHER NATURAL DEATH (NO OFFENSE) SOUTHEAST 350.0 D8 Individual White Non-Hispanic or Latino Female 83 10708 SMYTHE,COLIN 5475 ELL,JEFFREY,A 2.55e+06 6.93e+06 75253 43903.0 51.2 48.8 30.7 68.2 64.6 8.8 7.1 75.1 12.6 83.0 0.4 58.0 5167.5 55.6 4.9 48316.3 22.3 24.6 2015 1 1 81
16 2015-01-01 January-2015 Thu 21:55 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV NORTHWEST 520.0 D2 Individual White Non-Hispanic or Latino Male 64 9480 DICKSON,JAY,CAMDAN 9430 ALMACHAR,JOSEPH 2.48e+06 6.99e+06 75209 33566.0 50.5 49.5 36.9 76.3 74.2 15.2 11.5 75.5 7.2 1.9 3.7 27.5 3739.1 66.8 3.9 160121.0 7.6 11.0 2015 1 1 81
17 2015-01-01 January-2015 Thu 00:15 34.35 89.47 0.03 20 - ROBBERY ROBBERY OF INDIVIDUAL (AGG) NORTHEAST 220.0 D7 Individual Hispanic or Latino Hispanic or Latino Male 23 9568 DETAMBLE,STEPHANIE,JULIANNA 9938 MADALINSKI,BRIAN,WILLIAM 2.53e+06 6.98e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 1 81
18 2015-01-01 January-2015 Thu 12:00 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF RECKLESS DAMAGE SOUTHWEST 450.0 D3 Individual Black Non-Hispanic or Latino Female 56 10331 BARRIENTOS,JORGE 10462 KELLY,JEANETTE 2.47e+06 6.93e+06 75232 50429.0 45.1 54.9 35.8 73.8 68.5 17.2 13.4 23.6 72.3 6.4 0.1 23.4 3827.5 50.4 6.3 42833.4 26.6 29.4 2015 1 1 81
19 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 09 - THEFT THEFT OF PROP <$50 - OTHER THAN SHOPLIFT SOUTHWEST 430.0 D3 Individual White Non-Hispanic or Latino Female 70 10452 CAMPBELL,HANS 9543 GONZALEZ,REGGIE 2.45e+06 6.91e+06 75249 16882.0 46.6 53.4 33.8 67.2 63.0 10.7 7.6 37.5 50.2 45.0 1.4 30.2 6078.0 63.4 4.0 59221.5 15.8 17.4 2015 1 1 81
20 2015-01-01 January-2015 Thu 13:10 34.35 89.47 0.03 32 - SUSPICIOUS PERSON FOUND PROPERTY (NO OFFENSE) SOUTHEAST 320.0 D5 Individual Unknown Non-Hispanic or Latino Female 34 10725 SMITH,PAUL 4665 NEVILS,RODNEY,L 2.54e+06 6.96e+06 75217 112811.0 49.5 50.5 28.0 65.4 60.6 8.8 6.9 62.5 24.6 19.4 0.1 67.9 4023.9 53.7 5.5 40980.1 30.1 32.6 2015 1 1 81
21 2015-01-01 January-2015 Thu 12:00 34.35 89.47 0.03 09V - UUMV UNAUTHORIZED USE OF MOTOR VEH - AUTOMOBILE SOUTHWEST 410.0 D1 Individual Hispanic or Latino Hispanic or Latino Male 39 10207 BERRY,JEFFREY,WAYNE 10202 ARRIVILLAGA,DANIEL,ALFONSO 2.47e+06 6.96e+06 75211 89910.0 51.0 49.0 28.8 68.4 62.9 8.3 6.5 76.2 10.5 22.9 1.1 78.0 4008.4 60.7 6.6 48504.9 22.7 26.1 2015 1 1 81
22 2015-01-01 January-2015 Thu 22:10 34.35 89.47 0.03 20 - ROBBERY ROBBERY OF INDIVIDUAL (AGG) SOUTH CENTRAL 710.0 D4 Individual Black Non-Hispanic or Latino Male 34 10750 REEDER,ROBERT 8583 HAYNES,ALAN,LENOIR 2.49e+06 6.96e+06 75203 39159.0 50.3 49.7 32.1 69.6 65.9 12.3 9.8 58.2 33.6 3.9 0.5 57.9 2619.9 53.2 7.8 43295.1 33.6 37.9 2015 1 1 81
23 2015-01-01 January-2015 Thu 17:30 34.35 89.47 0.03 09 - THEFT THEFT OF PROP > OR EQUAL $50 BUT <$500- NOT SH... NORTHEAST 230.0 D9 Individual White Non-Hispanic or Latino Male 24 10776 LAWSON,JONATHAN 9482 FRIEND,SHELLEY,DENISE 2.52e+06 6.99e+06 75218 37549.0 48.3 51.7 36.6 77.7 75.0 14.1 11.1 76.2 11.9 11.4 0.9 35.2 4238.4 68.5 3.2 77928.1 14.9 18.4 2015 1 1 81
24 2015-01-01 January-2015 Thu 03:00 34.35 89.47 0.03 16 - INJURED PERSON ASSAULT (AGG) -DEADLY WEAPON NORTHWEST 520.0 D2 Individual White Non-Hispanic or Latino Male 26 7741 PARKER,COREY 7389 CLIFFORD,MICHAEL,A 2.47e+06 6.99e+06 75247 11185.0 66.3 33.7 33.5 94.3 89.5 6.0 4.9 46.2 43.6 106.0 0.7 18.1 10621.0 22.4 2.6 78131.0 24.8 28.2 2015 1 1 81
25 2015-01-01 January-2015 Thu 17:30 34.35 89.47 0.03 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY CENTRAL 110.0 D2 Individual White Non-Hispanic or Latino Female 64 10707 LEE,MICHAEL 5477 FREE,WILLIAM,L 2.51e+06 6.97e+06 75223 16347.0 50.5 49.5 32.2 73.7 70.8 11.4 8.7 80.6 9.5 17.6 1.9 68.0 2469.8 62.7 5.6 57968.2 21.5 22.5 2015 1 1 81
26 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 58 - ROUTINE INVESTIGATION THEFT OF PROP > OR EQUAL $50 BUT <$500- NOT SH... SOUTHEAST 340.0 D7 Individual Black Non-Hispanic or Latino Female 55 10695 YEATMAN,KEVIN 5475 ELL,JEFFREY,A 2.50e+06 6.97e+06 75215 22570.0 48.8 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 1 1 81
27 2015-01-01 January-2015 Thu 14:18 34.35 89.47 0.03 46 - CIT CRIM MISCHIEF > OR EQUAL $500 BUT < $1,500 NORTHEAST 230.0 D9 Individual Middle Eastern Non-Hispanic or Latino Male 37 10315 EARLY,BRET 10639 GREENE,IAN,CURTIS 2.53e+06 7.00e+06 75218 37549.0 48.3 51.7 36.6 77.7 75.0 14.1 11.1 76.2 11.9 11.4 0.9 35.2 4238.4 68.5 3.2 77928.1 14.9 18.4 2015 1 1 81
28 2015-01-01 January-2015 Thu 10:45 34.35 89.47 0.03 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY NORTHEAST 250.0 D10 Individual White Non-Hispanic or Latino Male 31 10678 LIENDO,DIANA 5425 LARKIN,DANIEL,J 2.52e+06 7.02e+06 75243 88224.0 48.8 51.2 33.2 73.9 70.2 14.1 11.0 50.5 32.0 8.4 7.4 28.0 3516.5 62.2 4.7 64629.4 24.2 26.5 2015 1 1 81
29 2015-01-01 January-2015 Thu 00:51 34.35 89.47 0.03 40/01 - OTHER ASSAULT -OFFENSIVE CONTACT NORTH CENTRAL 640.0 D11 Individual Asian Non-Hispanic or Latino Male 36 10640 AKIN,THERON,NATHANIEL 10020 WILLIS,RYAN,DESHAWN 2.49e+06 7.02e+06 75240 47257.0 51.0 49.0 34.1 74.6 71.9 14.8 12.6 53.0 10.2 3.6 4.6 41.8 3637.3 68.7 4.6 84097.6 22.7 23.0 2015 1 1 81
30 2015-01-01 January-2015 Thu 00:10 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV NORTHEAST 240.0 D10 Individual Black Non-Hispanic or Latino Male 34 10691 JORDAN,DERRICK 5495 RAYMOND,LANCE,J 2.50e+06 7.02e+06 75243 88224.0 48.8 51.2 33.2 73.9 70.2 14.1 11.0 50.5 32.0 8.4 7.4 28.0 3516.5 62.2 4.7 64629.4 24.2 26.5 2015 1 1 81
31 2015-01-01 January-2015 Thu 14:20 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) CRIM MISCHIEF > OR EQUAL $1,500 BUT < $20K SOUTHWEST 410.0 D1 Individual White Non-Hispanic or Latino Male 53 7347 IRWIN,MICHAEL,SCOTT 8779 HAMM,MITCHELL,LYNN 2.47e+06 6.96e+06 75208 41872.0 51.6 48.4 34.2 73.5 69.8 12.3 9.6 83.9 4.9 11.2 0.5 70.4 2857.6 61.1 5.5 66624.8 17.7 22.6 2015 1 1 81
32 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 11B - BURG OF BUS BURGLARY - PRELIMINARY INVESTIGATION CENTRAL 150.0 D14 Individual Black Non-Hispanic or Latino Female 45 11338 GONZALEZ,ELIEZAR 11389 BJORNSON,BROOKE,ASHLEY 2.49e+06 6.98e+06 75204 43764.0 52.5 47.5 32.5 90.4 88.2 10.6 8.3 72.4 10.9 9.1 5.5 23.7 3284.3 76.4 3.5 93590.8 15.9 19.7 2015 1 1 81
33 2015-01-01 January-2015 Thu 03:12 34.35 89.47 0.03 20 - ROBBERY ROBBERY OF INDIVIDUAL NORTHWEST 540.0 D14 Individual Hispanic or Latino Hispanic or Latino Female 31 9879 HANLON,BRADLEY,JOSEPH 9863 ALVAREZ,ALBERT,ADAM 2.49e+06 6.98e+06 75219 42556.0 54.4 45.6 38.5 87.2 85.7 14.3 10.6 68.6 7.2 10.1 6.1 28.4 3694.3 71.9 3.8 121613.6 13.4 16.0 2015 1 1 81
34 2015-01-01 January-2015 Thu 04:30 34.35 89.47 0.03 58 - ROUTINE INVESTIGATION ASSAULT -OFFENSIVE CONTACT CENTRAL 150.0 D2 Individual White Non-Hispanic or Latino Male 30 10570 LE,QUOC 10531 DAGO,CARREON 2.50e+06 6.97e+06 75226 13979.0 51.4 48.6 36.5 88.8 86.2 12.0 9.3 52.0 39.4 6.2 3.1 24.1 3175.8 64.1 3.6 60219.0 18.0 30.9 2015 1 1 81
35 2015-01-01 January-2015 Thu 17:00 34.35 89.47 0.03 DASV-DIST ACTIVE SHOOTER VEH FAIL TO LEAVE ID AT SCENE OF ACCIDENT DAMAGE <... CENTRAL 110.0 D2 Individual Black Non-Hispanic or Latino Male 28 10610 SMITH,TAYLOR 10366 SANCHEZ,DAVID 2.51e+06 6.98e+06 75223 16347.0 50.5 49.5 32.2 73.7 70.8 11.4 8.7 80.6 9.5 17.6 1.9 68.0 2469.8 62.7 5.6 57968.2 21.5 22.5 2015 1 1 81
36 2015-01-01 January-2015 Thu 05:00 34.35 89.47 0.03 16 - INJURED PERSON ASSAULT (AGG) -SERIOUS BODILY INJURY NORTHEAST 230.0 D9 Individual Hispanic or Latino Hispanic or Latino Male 32 10765 HOLT,JERE 6006 STONAKER,VICKIE,ANN 2.53e+06 7.00e+06 75218 37549.0 48.3 51.7 36.6 77.7 75.0 14.1 11.1 76.2 11.9 11.4 0.9 35.2 4238.4 68.5 3.2 77928.1 14.9 18.4 2015 1 1 81
37 2015-01-01 January-2015 Thu 19:00 34.35 89.47 0.03 40 - OTHER THREATENING PHONE CALLS NORTHWEST 520.0 D2 Individual Hispanic or Latino Hispanic or Latino Female 19 9363 RODRIQUEZ,RAUL,SERGIO 10335 EVANS,MELANIE 2.47e+06 6.99e+06 75235 35410.0 57.7 42.3 33.1 82.0 79.1 9.8 6.9 55.3 15.0 21.2 5.8 46.6 5999.4 55.2 3.8 65805.0 18.4 25.0 2015 1 1 81
38 2015-01-01 January-2015 Thu 20:30 34.35 89.47 0.03 40 - OTHER HARASSMENT NORTHEAST 250.0 D10 Individual Black Non-Hispanic or Latino Male 27 10786 BAKOS,JESSE 9121 RAMOS,RAYMOND,DANIEL 2.51e+06 7.02e+06 75243 88224.0 48.8 51.2 33.2 73.9 70.2 14.1 11.0 50.5 32.0 8.4 7.4 28.0 3516.5 62.2 4.7 64629.4 24.2 26.5 2015 1 1 81
39 2015-01-01 January-2015 Thu 00:08 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 NORTH CENTRAL 630.0 D11 Individual White Non-Hispanic or Latino Female 25 10698 BEAUDREAULT,ZACHARY 7543 AVILA II,HUMBERTO,JAVIER 2.49e+06 7.03e+06 75254 60115.0 48.8 51.2 35.2 79.2 76.4 13.0 10.5 51.2 15.1 2.4 5.1 32.2 3476.9 73.2 4.4 86634.6 12.9 14.5 2015 1 1 81
40 2015-01-01 January-2015 Thu 14:14 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) ASSAULT (AGG) -DEADLY WEAPON SOUTH CENTRAL 740.0 D8 Individual Black Non-Hispanic or Latino Male 21 8173 JONES,REGINALD,LADUNNE 8936 MAXWELL,MARCUS,CARRELL 2.47e+06 6.93e+06 75237 32144.0 43.6 56.4 31.4 72.6 67.1 9.5 7.5 25.4 68.3 8.5 0.7 23.6 4005.2 60.8 8.1 37137.8 27.4 28.0 2015 1 1 81
41 2015-01-01 January-2015 Thu 19:00 34.35 89.47 0.03 40/01 - OTHER UNEXPLAINED DEATH (NO OFFENSE) SOUTH CENTRAL 720.0 D3 Individual Black Non-Hispanic or Latino Male 60 10746 GRAY JR,WILLIAM 7505 CROSON,IVERY,BURNETT 2.49e+06 6.94e+06 75216 76015.0 46.6 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 1 1 81
42 2015-01-01 January-2015 Thu 17:00 34.35 89.47 0.03 DASV-DIST ACTIVE SHOOTER VEH DEADLY CONDUCT DISCHARGE FIREARM CENTRAL 110.0 D2 Individual Black Non-Hispanic or Latino Male 28 10610 SMITH,TAYLOR 10366 SANCHEZ,DAVID 2.51e+06 6.98e+06 75223 16347.0 50.5 49.5 32.2 73.7 70.8 11.4 8.7 80.6 9.5 17.6 1.9 68.0 2469.8 62.7 5.6 57968.2 21.5 22.5 2015 1 1 81
43 2015-01-01 January-2015 Thu 04:00 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) ASSAULT -BODILY INJURY ONLY NORTHWEST 510.0 D2 Individual White Non-Hispanic or Latino Female 36 9724 DRAGIJA,HEIDI,KATHLEEN 10134 SAMBOLA,WALTER,ANTHONY 2.48e+06 6.98e+06 75207 17492.0 60.7 39.3 35.8 95.6 92.6 6.0 3.6 51.2 38.7 63.0 2.6 19.6 8370.5 46.3 3.6 77086.0 16.0 29.8 2015 1 1 81
44 2015-01-01 January-2015 Thu 08:00 34.35 89.47 0.03 09 - THEFT CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTH CENTRAL 730.0 D4 Individual Black Non-Hispanic or Latino Female 40 9842 HARDER,MATTHEW,WADE 10613 ROBINSON,SEAN 2.50e+06 6.94e+06 75216 76015.0 46.6 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 1 1 81
45 2015-01-01 January-2015 Thu 00:40 34.35 89.47 0.03 40/01 - OTHER TRAF VIO - DUTY ON STRIKING UNATTENDED VEH CENTRAL 110.0 D2 Individual Hispanic or Latino Hispanic or Latino Female 61 9155 EVANGELISTA,DERRICK,SCOTT 9151 FRITTS,RYAN,MATTHEW 2.50e+06 6.98e+06 75226 13979.0 51.4 48.6 36.5 88.8 86.2 12.0 9.3 52.0 39.4 6.2 3.1 24.1 3175.8 64.1 3.6 60219.0 18.0 30.9 2015 1 1 81
46 2015-01-01 January-2015 Thu 22:00 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV SOUTHEAST 310.0 D7 Individual Black Non-Hispanic or Latino Female 21 10204 GLASS,NACOLE,A 10663 CARTER,HOLLY,AMELIA 2.53e+06 6.97e+06 75227 65171.0 48.3 51.7 29.0 66.2 61.6 9.4 7.4 57.3 31.0 12.8 0.7 57.3 4509.9 56.9 5.5 44963.7 25.9 27.9 2015 1 1 81
47 2015-01-01 January-2015 Thu 05:00 34.35 89.47 0.03 16 - INJURED PERSON INJURED PERSON - FIREARM INJURY (NO OFFENSE) SOUTHWEST 410.0 D1 Individual Black Non-Hispanic or Latino Male 31 9794 BIRD,MARC,A 10030 HERNANDEZ,JORGE,ANTONIO 2.48e+06 6.96e+06 75203 39159.0 50.3 49.7 32.1 69.6 65.9 12.3 9.8 58.2 33.6 3.9 0.5 57.9 2619.9 53.2 7.8 43295.1 33.6 37.9 2015 1 1 81
48 2015-01-01 January-2015 Thu 17:57 34.35 89.47 0.03 20 - ROBBERY ROBBERY OF INDIVIDUAL (AGG) SOUTH CENTRAL 710.0 D4 Individual Hispanic or Latino Hispanic or Latino Male 21 9990 OHAYRE,COLIN,MICHAEL 9738 CAMPBELL,DANIEL,DAVID 2.49e+06 6.95e+06 75216 76015.0 46.6 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 1 1 81
49 2015-01-01 January-2015 Thu 05:20 34.35 89.47 0.03 11R - BURG OF RES CRIM MISCHIEF > OR EQUAL $500 BUT < $1,500 NORTH CENTRAL 650.0 D13 Individual White Non-Hispanic or Latino Female 45 10682 VANDERGRIFF,CHRISTINA 6301 BURBULYS,PAUL,DANIEL 2.50e+06 7.00e+06 75225 38182.0 46.4 53.6 42.8 75.5 69.2 21.3 17.4 93.0 0.6 5.3 3.8 5.9 2960.4 58.6 1.7 266353.9 1.7 4.2 2015 1 1 81
50 2015-01-01 January-2015 Thu 19:00 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV SOUTHWEST 410.0 D1 Individual Hispanic or Latino Hispanic or Latino Male 52 10756 STERLE,PHILIP 8315 GEISSLER,KEITH,THOMAS 2.48e+06 6.96e+06 75208 41872.0 51.6 48.4 34.2 73.5 69.8 12.3 9.6 83.9 4.9 11.2 0.5 70.4 2857.6 61.1 5.5 66624.8 17.7 22.6 2015 1 1 81
51 2015-01-01 January-2015 Thu 21:30 34.35 89.47 0.03 13 - PROWLER CRIM MISCHIEF > OR EQUAL $500 BUT < $1,500 SOUTHEAST 310.0 D7 Individual White Non-Hispanic or Latino Female 29 9163 SHIPP,JOSHUA,GLYNN 10631 JONES,JESIKA 2.52e+06 6.97e+06 75227 65171.0 48.3 51.7 29.0 66.2 61.6 9.4 7.4 57.3 31.0 12.8 0.7 57.3 4509.9 56.9 5.5 44963.7 25.9 27.9 2015 1 1 81
52 2015-01-01 January-2015 Thu 22:30 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV CENTRAL 130.0 D14 Individual White Non-Hispanic or Latino Male 40 9712 CARDINAL,TRAVIS,JOHN 10308 WELSH,MICHAEL 2.49e+06 6.97e+06 75202 16205.0 56.8 43.2 32.9 94.7 94.0 5.4 2.9 65.3 19.6 10.8 4.6 13.4 3846.0 80.8 3.5 105410.2 10.7 17.6 2015 1 1 81
53 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 40 - OTHER HARASSMENT CENTRAL 120.0 D14 Individual White Non-Hispanic or Latino Male 54 10354 BARNETT,RYAN 10381 ARRIAGA,ERIK 2.49e+06 6.98e+06 75204 43764.0 52.5 47.5 32.5 90.4 88.2 10.6 8.3 72.4 10.9 9.1 5.5 23.7 3284.3 76.4 3.5 93590.8 15.9 19.7 2015 1 1 81
54 2015-01-01 January-2015 Thu 23:20 34.35 89.47 0.03 07 - MINOR ACCIDENT ACCIDENT INVOLVING INJURY CENTRAL 130.0 D14 Individual Hispanic or Latino Hispanic or Latino Male 37 9228 ALLIE,MATTHEW,THOMAS 7180 TAYLOR,DEBORA,ANN 2.49e+06 6.97e+06 75201 30425.0 54.7 45.3 32.6 96.0 95.5 6.9 4.5 71.7 12.3 11.7 6.0 15.1 3263.8 84.3 2.5 125623.3 6.7 12.0 2015 1 1 81
55 2015-01-01 January-2015 Thu 11:00 34.35 89.47 0.03 11R - BURG OF RES CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTH CENTRAL 740.0 D8 Individual Black Non-Hispanic or Latino Female 34 10611 ROCHA,MICHAEL 9586 RISER,BRYAN,KEON 2.47e+06 6.93e+06 75237 32144.0 43.6 56.4 31.4 72.6 67.1 9.5 7.5 25.4 68.3 8.5 0.7 23.6 4005.2 60.8 8.1 37137.8 27.4 28.0 2015 1 1 81
56 2015-01-01 January-2015 Thu 01:30 34.35 89.47 0.03 20 - ROBBERY THEFT OF SERVICE > OR EQUAL $20 BUT <$500 CENTRAL 120.0 D14 Individual Black Non-Hispanic or Latino Male 28 9851 JACKSON JR.,TIMOTHY,BLAKE 7814 SCOGGINS,JASON,KYLE 2.49e+06 6.98e+06 75201 30425.0 54.7 45.3 32.6 96.0 95.5 6.9 4.5 71.7 12.3 11.7 6.0 15.1 3263.8 84.3 2.5 125623.3 6.7 12.0 2015 1 1 81
57 2015-01-01 January-2015 Thu 17:30 34.35 89.47 0.03 11R - BURG OF RES CREDIT CARD OR DEBIT CARD ABUSE CENTRAL 110.0 D2 Individual White Non-Hispanic or Latino Female 64 10707 LEE,MICHAEL 5477 FREE,WILLIAM,L 2.51e+06 6.97e+06 75223 16347.0 50.5 49.5 32.2 73.7 70.8 11.4 8.7 80.6 9.5 17.6 1.9 68.0 2469.8 62.7 5.6 57968.2 21.5 22.5 2015 1 1 81
58 2015-01-01 January-2015 Thu 00:20 34.35 89.47 0.03 DASF-DIST ACTIVE SHOOTER FOOT DEADLY CONDUCT DISCHARGE FIREARM NORTHEAST 220.0 D9 Individual Black Non-Hispanic or Latino Male 33 10600 FARRINGTON,DAVID 10607 PERKINS,KEVIN 2.54e+06 7.00e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 1 81
59 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 11R - BURG OF RES THEFT OF PROP > OR EQUAL $500 BUT <$1,500 (NOT... SOUTHEAST 320.0 D7 Individual Hispanic or Latino Hispanic or Latino Male 24 10341 HICKS,BLAKE 10330 HIGHT,NATHAN 2.54e+06 6.96e+06 75227 65171.0 48.3 51.7 29.0 66.2 61.6 9.4 7.4 57.3 31.0 12.8 0.7 57.3 4509.9 56.9 5.5 44963.7 25.9 27.9 2015 1 1 81
60 2015-01-01 January-2015 Thu 22:30 34.35 89.47 0.03 40/01 - OTHER CRIM MISCHIEF < $50 SOUTH CENTRAL 730.0 D3 Individual White Non-Hispanic or Latino Male 70 10716 LOWE,MATTHEW 5148 HOLIDAY,ROOSEVELT 2.49e+06 6.94e+06 75241 50872.0 45.6 54.4 34.0 71.8 68.1 14.1 12.0 21.1 74.7 11.5 0.0 20.3 3777.1 48.9 6.3 42029.6 27.8 30.6 2015 1 1 81
61 2015-01-01 January-2015 Thu 10:53 34.35 89.47 0.03 12R - RESIDENTIAL ALARM BURGLARY OF HABITATION - FORCED ENTRY NORTHEAST 250.0 D10 Individual White Non-Hispanic or Latino Male 31 10937 MORRIS,CHASE,MITCHELL 4732 GNAGI,MATTHEW,S 2.52e+06 7.02e+06 75243 88224.0 48.8 51.2 33.2 73.9 70.2 14.1 11.0 50.5 32.0 8.4 7.4 28.0 3516.5 62.2 4.7 64629.4 24.2 26.5 2015 1 1 81
62 2015-01-01 January-2015 Thu 06:00 34.35 89.47 0.03 41/11R - BURG RES IN PROGRESS CRIM MISCHIEF > OR EQUAL $50 BUT < $500 NORTHEAST 230.0 D9 Individual Hispanic or Latino Hispanic or Latino Male 41 10679 WITT,CODY 5182 TANNEY,LORELEI,L 2.53e+06 6.99e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 1 81
63 2015-01-01 January-2015 Thu 12:00 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV NORTH CENTRAL 650.0 D13 Individual Black Non-Hispanic or Latino Male 30 10641 SHERMAN,JONATHAN,SCOTT 9949 MACIAS,OSCAR,IVAN 2.49e+06 7.00e+06 75225 38182.0 46.4 53.6 42.8 75.5 69.2 21.3 17.4 93.0 0.6 5.3 3.8 5.9 2960.4 58.6 1.7 266353.9 1.7 4.2 2015 1 1 81
64 2015-01-01 January-2015 Thu 06:45 34.35 89.47 0.03 16 - INJURED PERSON ASSAULT -OFFENSIVE CONTACT NORTHWEST 510.0 D2 Individual White Non-Hispanic or Latino Male 29 10733 SALDANA,JUAN 5258 CRAWFORD,KIMBERLY,M 2.47e+06 6.99e+06 75247 11185.0 66.3 33.7 33.5 94.3 89.5 6.0 4.9 46.2 43.6 106.0 0.7 18.1 10621.0 22.4 2.6 78131.0 24.8 28.2 2015 1 1 81
65 2015-01-01 January-2015 Thu 02:15 34.35 89.47 0.03 41/09 - THEFT - IN PROGRESS CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTHEAST 330.0 D5 Individual White Non-Hispanic or Latino Male 30 10624 ARVIZU,JOSE 9672 DEBEVEC,JEFFREY,ALAN 2.53e+06 6.95e+06 75217 112811.0 49.5 50.5 28.0 65.4 60.6 8.8 6.9 62.5 24.6 19.4 0.1 67.9 4023.9 53.7 5.5 40980.1 30.1 32.6 2015 1 1 81
66 2015-01-01 January-2015 Thu 04:00 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) ASSAULT -BODILY INJURY ONLY NORTHWEST 510.0 D2 Individual Black Non-Hispanic or Latino Female 50 9724 DRAGIJA,HEIDI,KATHLEEN 10134 SAMBOLA,WALTER,ANTHONY 2.48e+06 6.98e+06 75207 17492.0 60.7 39.3 35.8 95.6 92.6 6.0 3.6 51.2 38.7 63.0 2.6 19.6 8370.5 46.3 3.6 77086.0 16.0 29.8 2015 1 1 81
67 2015-01-01 January-2015 Thu 00:10 34.35 89.47 0.03 11V - BURG MOTOR VEH BMV NORTHEAST 240.0 D10 Individual Black Non-Hispanic or Latino Male 42 10691 JORDAN,DERRICK 5495 RAYMOND,LANCE,J 2.50e+06 7.02e+06 75243 88224.0 48.8 51.2 33.2 73.9 70.2 14.1 11.0 50.5 32.0 8.4 7.4 28.0 3516.5 62.2 4.7 64629.4 24.2 26.5 2015 1 1 81
68 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 07 - MINOR ACCIDENT RECKLESS DAMAGE NORTHEAST 210.0 D14 Individual White Non-Hispanic or Latino Male 47 10726 KENDALL,STEPHAN 7342 FREEMAN,SHAWN,EDWARD 2.50e+06 7.00e+06 75206 53930.0 52.0 48.0 31.2 86.3 81.0 8.7 6.7 78.6 7.1 8.2 5.4 24.7 2753.3 73.7 3.5 91725.1 11.8 17.4 2015 1 1 81
69 2015-01-01 January-2015 Thu 21:40 34.35 89.47 0.03 40 - OTHER HARASSMENT SOUTHWEST 410.0 D1 Individual Hispanic or Latino Hispanic or Latino Female 39 10689 ANDERSON,PATRICK 7466 KISNER,NATHAN,EDWARD 2.49e+06 6.96e+06 75203 39159.0 50.3 49.7 32.1 69.6 65.9 12.3 9.8 58.2 33.6 3.9 0.5 57.9 2619.9 53.2 7.8 43295.1 33.6 37.9 2015 1 1 81
70 2015-01-01 January-2015 Thu 13:40 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTHEAST 350.0 D8 Individual Black Non-Hispanic or Latino Female 35 10650 TAYLOR,QUXAVIER,DEQUINN 7795 JONES,JEFFREY,DAVIS 2.52e+06 6.95e+06 75217 112811.0 49.5 50.5 28.0 65.4 60.6 8.8 6.9 62.5 24.6 19.4 0.1 67.9 4023.9 53.7 5.5 40980.1 30.1 32.6 2015 1 1 81
71 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 31 - CRIMINAL MISCHIEF RECKLESS DAMAGE NORTH CENTRAL 610.0 D12 Individual Black Non-Hispanic or Latino Female 33 8175 JENKINS,ROY,LYNN 7712 HEBERT,HEATHER,RENAE 2.48e+06 7.05e+06 75287 3031.0 44.7 55.3 38.5 86.0 85.2 17.6 14.6 63.8 17.6 0.0 4.9 19.2 2688.0 74.7 6.0 89222.0 6.5 5.6 2015 1 1 81
72 2015-01-01 January-2015 Thu 21:45 34.35 89.47 0.03 37F - FREEWAY BLOCKAGE BMV SOUTHWEST 430.0 D3 Individual Hispanic or Latino Hispanic or Latino Female 19 10723 BENITES JR,RODRIGO 8041 SALDANA,IVAN,OMAR 2.45e+06 6.94e+06 75236 49828.0 47.6 52.4 30.6 70.0 63.5 9.3 6.9 50.0 38.6 42.6 1.7 43.8 5176.9 63.2 6.8 55980.7 16.9 19.8 2015 1 1 81
73 2015-01-01 January-2015 Thu 00:00 34.35 89.47 0.03 58 - ROUTINE INVESTIGATION THEFT OF PROP > OR EQUAL $20K<$100K- NOT SHOPLIFT NORTHWEST 540.0 D2 Individual Asian Non-Hispanic or Latino Male 53 9365 KIM,JOON,CHEOL 8750 TAYLOR,CHRISTOPHER,BRIAN 2.48e+06 6.98e+06 75219 42556.0 54.4 45.6 38.5 87.2 85.7 14.3 10.6 68.6 7.2 10.1 6.1 28.4 3694.3 71.9 3.8 121613.6 13.4 16.0 2015 1 1 81
74 2015-01-01 January-2015 Thu 17:00 34.35 89.47 0.03 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY NORTHEAST 240.0 D10 Individual Black Non-Hispanic or Latino Female 29 10130 PURDY,THOMAS,ALLEN 9215 MABEE,REYNOLD,JAMES 2.51e+06 7.01e+06 75231 63667.0 49.5 50.5 36.4 77.9 75.1 16.4 12.7 63.4 23.8 11.9 3.8 27.5 2721.6 63.1 4.2 74902.4 21.3 24.5 2015 1 1 81
75 2015-01-01 January-2015 Thu 21:53 34.35 89.47 0.03 41/11R - BURG RES IN PROGRESS CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTHEAST 340.0 D7 Individual Black Non-Hispanic or Latino Male 29 10251 PENLAND,DUSTIN,LEE 9711 TOLERTON,AARON,DAVID 2.50e+06 6.97e+06 75215 22570.0 48.8 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 1 1 81
76 2015-01-01 January-2015 Thu 02:20 34.35 89.47 0.03 40/01 - OTHER INJURED PERSON- PUBLIC PROPERTY (OTHER THAN FI... CENTRAL 120.0 D14 Individual White Non-Hispanic or Latino Male 27 9851 JACKSON JR.,TIMOTHY,BLAKE 7814 SCOGGINS,JASON,KYLE 2.49e+06 6.98e+06 75201 30425.0 54.7 45.3 32.6 96.0 95.5 6.9 4.5 71.7 12.3 11.7 6.0 15.1 3263.8 84.3 2.5 125623.3 6.7 12.0 2015 1 1 81
77 2015-01-01 January-2015 Thu 16:00 34.35 89.47 0.03 09 - THEFT THEFT OF PROP (ATT) <$50 (NOT EMPLOYEE) NORTHEAST 210.0 D13 Individual White Non-Hispanic or Latino Female 27 10658 VALENTIN-BELLO,JEAN 10211 SPENCE,MUNDELL,LEMAR 2.50e+06 7.00e+06 75231 63667.0 49.5 50.5 36.4 77.9 75.1 16.4 12.7 63.4 23.8 11.9 3.8 27.5 2721.6 63.1 4.2 74902.4 21.3 24.5 2015 1 1 81
78 2015-01-01 January-2015 Thu 04:38 34.35 89.47 0.03 DASF-DIST ACTIVE SHOOTER FOOT CRIM MISCHIEF > OR EQUAL $500 BUT < $1,500 SOUTHWEST 420.0 D6 Individual Hispanic or Latino Hispanic or Latino Female 17 10776 LAWSON,JONATHAN 10787 TILIACOS,LORIANN 2.46e+06 6.97e+06 75243 88224.0 48.8 51.2 33.2 73.9 70.2 14.1 11.0 50.5 32.0 8.4 7.4 28.0 3516.5 62.2 4.7 64629.4 24.2 26.5 2015 1 1 81
79 2015-01-01 January-2015 Thu 08:00 34.35 89.47 0.03 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY SOUTHEAST 340.0 D7 Individual Black Non-Hispanic or Latino Female 65 10787 TILIACOS,LORIANN 8649 SMITH,EDRICK,D. 2.50e+06 6.96e+06 75215 22570.0 48.8 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 1 1 81
80 2015-01-01 January-2015 Thu 15:30 34.35 89.47 0.03 6X - MAJOR DIST (VIOLENCE) VIO BOND/PROTECTIVE ORDER SOUTHEAST 310.0 D7 Individual Hispanic or Latino Hispanic or Latino Female 34 7970 MARTINEZ JR,AGUSTIN 8906 HERNANDEZ,RAUL,ORLANDO 2.51e+06 6.97e+06 75227 65171.0 48.3 51.7 29.0 66.2 61.6 9.4 7.4 57.3 31.0 12.8 0.7 57.3 4509.9 56.9 5.5 44963.7 25.9 27.9 2015 1 1 81
81 2015-01-02 January-2015 Fri 20:00 38.73 96.56 0.02 40/01 - OTHER UNEXPLAINED DEATH (NO OFFENSE) SOUTHEAST 330.0 D5 Individual White Non-Hispanic or Latino Male 84 10650 TAYLOR,QUXAVIER,DEQUINN 7795 JONES,JEFFREY,DAVIS 2.54e+06 6.95e+06 75217 112811.0 49.5 50.5 28.0 65.4 60.6 8.8 6.9 62.5 24.6 19.4 0.1 67.9 4023.9 53.7 5.5 40980.1 30.1 32.6 2015 1 2 52
82 2015-01-02 January-2015 Fri 17:29 38.73 96.56 0.02 6X - MAJOR DIST (VIOLENCE) ROBBERY OF INDIVIDUAL NORTHEAST 220.0 D7 Individual Hispanic or Latino Hispanic or Latino Male 38 10639 GREENE,IAN,CURTIS 10315 EARLY,BRET 2.53e+06 6.98e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 2 52
83 2015-01-02 January-2015 Fri 22:30 38.73 96.56 0.02 58 - ROUTINE INVESTIGATION UNAUTHORIZED USE OF MOTOR VEH - AUTOMOBILE NORTHWEST 510.0 D2 Individual White Non-Hispanic or Latino Male 30 10752 BRADLEY,PAUL 7886 SMITH,SHANNON,SHURROD 2.47e+06 6.99e+06 75235 35410.0 57.7 42.3 33.1 82.0 79.1 9.8 6.9 55.3 15.0 21.2 5.8 46.6 5999.4 55.2 3.8 65805.0 18.4 25.0 2015 1 2 52
84 2015-01-02 January-2015 Fri 04:40 38.73 96.56 0.02 41/09 - THEFT - IN PROGRESS THEFT OF PROP (AUTO ACC) > OR EQUAL $50 BUT<$5... NORTHWEST 540.0 D2 Individual Black Non-Hispanic or Latino Male 44 9724 DRAGIJA,HEIDI,KATHLEEN 10134 SAMBOLA,WALTER,ANTHONY 2.48e+06 6.98e+06 75235 35410.0 57.7 42.3 33.1 82.0 79.1 9.8 6.9 55.3 15.0 21.2 5.8 46.6 5999.4 55.2 3.8 65805.0 18.4 25.0 2015 1 2 52
85 2015-01-02 January-2015 Fri 21:00 38.73 96.56 0.02 11R - BURG OF RES BURGLARY OF HABITATION -NO FORCED ENTRY SOUTH CENTRAL 750.0 D3 Individual Black Non-Hispanic or Latino Female 58 10603 KNUTSON,DANICA 8133 ADAMS,CORY,JAMES 2.49e+06 6.93e+06 75216 76015.0 46.6 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 1 2 52
86 2015-01-02 January-2015 Fri 20:15 38.73 96.56 0.02 31 - CRIMINAL MISCHIEF RECKLESS DAMAGE SOUTHEAST 310.0 D7 Individual Black Non-Hispanic or Latino Female 21 10693 ORTIZ,SERGIO 7897 ROYAL,MICHAEL,ANDREW 2.51e+06 6.96e+06 75215 22570.0 48.8 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 1 2 52
87 2015-01-02 January-2015 Fri 22:15 38.73 96.56 0.02 09/01 - THEFT THEFT OF PROP > OR EQUAL $50 BUT <$500- NOT SH... SOUTH CENTRAL 730.0 D4 Individual Black Non-Hispanic or Latino Male 27 9990 OHAYRE,COLIN,MICHAEL 10479 JOHNSON,AUBRI 2.50e+06 6.94e+06 75216 76015.0 46.6 53.4 35.2 71.9 67.9 16.3 13.4 29.9 65.0 3.1 0.1 32.8 3142.2 45.7 6.6 35651.2 34.2 38.6 2015 1 2 52
88 2015-01-02 January-2015 Fri 13:00 38.73 96.56 0.02 31 - CRIMINAL MISCHIEF DUTY ON STRIKE UNATTENDED (PARKED) VEHICLE SOUTHWEST 420.0 D6 Individual Hispanic or Latino Hispanic or Latino Female 55 9376 ROMANEK,BRITTANY,ANN 8548 RODRIGUEZ,ARNOLD,TORRES 2.48e+06 6.97e+06 75208 41872.0 51.6 48.4 34.2 73.5 69.8 12.3 9.6 83.9 4.9 11.2 0.5 70.4 2857.6 61.1 5.5 66624.8 17.7 22.6 2015 1 2 52
89 2015-01-02 January-2015 Fri 06:00 38.73 96.56 0.02 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY SOUTHWEST 440.0 D1 Individual Hispanic or Latino Hispanic or Latino Female 53 10462 KELLY,JEANETTE 10331 BARRIENTOS,JORGE 2.47e+06 6.96e+06 75211 89910.0 51.0 49.0 28.8 68.4 62.9 8.3 6.5 76.2 10.5 22.9 1.1 78.0 4008.4 60.7 6.6 48504.9 22.7 26.1 2015 1 2 52
90 2015-01-02 January-2015 Fri 23:00 38.73 96.56 0.02 11R - BURG OF RES BURGLARY OF BUILDING - FORCED ENTRY SOUTHWEST 410.0 D1 Individual Hispanic or Latino Hispanic or Latino Male 40 9543 GONZALEZ,REGGIE 10452 CAMPBELL,HANS 2.47e+06 6.96e+06 75211 89910.0 51.0 49.0 28.8 68.4 62.9 8.3 6.5 76.2 10.5 22.9 1.1 78.0 4008.4 60.7 6.6 48504.9 22.7 26.1 2015 1 2 52
91 2015-01-02 January-2015 Fri 15:51 38.73 96.56 0.02 6X - MAJOR DIST (VIOLENCE) ASSAULT -OFFENSIVE CONTACT SOUTHEAST 340.0 D7 Individual Black Non-Hispanic or Latino Female 40 10447 JEFFERSON III,ALBERT 8763 CHAVARRIA,SERGIO,LUIS 2.50e+06 6.96e+06 75215 22570.0 48.8 51.2 38.8 79.2 76.1 16.2 12.8 22.7 73.6 3.1 0.7 17.0 2085.6 44.9 6.5 36629.8 30.3 37.2 2015 1 2 52
92 2015-01-02 January-2015 Fri 09:30 38.73 96.56 0.02 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY NORTHEAST 220.0 D9 Individual White Non-Hispanic or Latino Male 47 10416 RUSHING,R.L. 10334 PENA,JOHN 2.53e+06 6.99e+06 75228 106467.0 49.1 50.9 31.8 70.2 66.2 11.5 9.0 49.3 23.6 13.2 1.9 49.0 3865.9 60.6 7.0 51758.2 22.6 24.5 2015 1 2 52
93 2015-01-02 January-2015 Fri 17:30 38.73 96.56 0.02 31 - CRIMINAL MISCHIEF BMV CENTRAL 150.0 D14 Individual White Non-Hispanic or Latino Female 33 10659 JASPER,MARK,ANDREW 6585 KEIM,KEVIN,S 2.49e+06 6.98e+06 75204 43764.0 52.5 47.5 32.5 90.4 88.2 10.6 8.3 72.4 10.9 9.1 5.5 23.7 3284.3 76.4 3.5 93590.8 15.9 19.7 2015 1 2 52
94 2015-01-02 January-2015 Fri 07:30 38.73 96.56 0.02 11R - BURG OF RES BURGLARY OF HABITATION - FORCED ENTRY NORTHEAST 230.0 D10 Individual White Non-Hispanic or Latino Female 62 10416 RUSHING,R.L. 10334 PENA,JOHN 2.52e+06 7.00e+06 75238 48978.0 49.5 50.5 36.1 74.4 72.2 13.9 11.4 73.0 17.3 16.5 4.0 26.3 3420.7 66.3 3.6 79223.5 13.9 17.0 2015 1 2 52
95 2015-01-02 January-2015 Fri 19:34 38.73 96.56 0.02 20 - ROBBERY ROBBERY OF INDIVIDUAL (AGG) SOUTH CENTRAL 740.0 D4 Individual Black Non-Hispanic or Latino Female 50 10750 REEDER,ROBERT 9000 BERNAL,JOSE,ISIDORO 2.48e+06 6.94e+06 75224 66607.0 48.4 51.6 31.5 69.9 64.6 13.8 10.5 54.3 35.4 26.8 0.8 54.8 4073.9 53.0 7.9 44495.7 27.0 31.1 2015 1 2 52
96 2015-01-02 January-2015 Fri 12:00 38.73 96.56 0.02 11R - BURG OF RES BURGLARY OF HABITATION -NO FORCED ENTRY NORTHWEST 540.0 D14 Individual White Non-Hispanic or Latino Female 63 10727 IRVIN,JOSHUA 7129 CALHOUN,EDWARD,EARL 2.49e+06 6.98e+06 75219 42556.0 54.4 45.6 38.5 87.2 85.7 14.3 10.6 68.6 7.2 10.1 6.1 28.4 3694.3 71.9 3.8 121613.6 13.4 16.0 2015 1 2 52
97 2015-01-02 January-2015 Fri 09:00 38.73 96.56 0.02 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $50 BUT < $500 SOUTH CENTRAL 720.0 D4 Individual White Non-Hispanic or Latino Female 26 10680 ORTIZ,DIEGO 5485 NASH JR,EMORY,O 2.48e+06 6.94e+06 75224 66607.0 48.4 51.6 31.5 69.9 64.6 13.8 10.5 54.3 35.4 26.8 0.8 54.8 4073.9 53.0 7.9 44495.7 27.0 31.1 2015 1 2 52
98 2015-01-02 January-2015 Fri 17:30 38.73 96.56 0.02 11V - BURG MOTOR VEH BMV NORTHEAST 210.0 D13 Individual Black Non-Hispanic or Latino Female 47 10704 MAYLE,RUSSELL 6214 DETHLOFF,RAYMOND 2.50e+06 7.00e+06 75231 63667.0 49.5 50.5 36.4 77.9 75.1 16.4 12.7 63.4 23.8 11.9 3.8 27.5 2721.6 63.1 4.2 74902.4 21.3 24.5 2015 1 2 52
99 2015-01-02 January-2015 Fri 20:00 38.73 96.56 0.02 11V - BURG MOTOR VEH BMV SOUTHWEST 410.0 D6 Individual Hispanic or Latino Hispanic or Latino Male 32 7959 PRESLEY,RANDALL,CRAIG 10128 CHRISTENSON,KEVIN,DUANE 2.46e+06 6.96e+06 75211 89910.0 51.0 49.0 28.8 68.4 62.9 8.3 6.5 76.2 10.5 22.9 1.1 78.0 4008.4 60.7 6.6 48504.9 22.7 26.1 2015 1 2 52

Since we are doing this project as if we are working in an actual work situation where analyzing time series data is a daily part of our position, we are going to approach our modeling in that manner. In a real-life situation it is of the utmost importance to quickly see if 1.) the data we have is actually worth something in regards to modeling, and 2.) preform a quick and "dirty" linear regression using OLS to verify that our data is actually worth using. Let's get started with a corrleation matrix for our data!

In [274]:
# How does everything relate to each other?
corr = df.corr().abs()
corr
Out[274]:
temp_in_F humidity percip_inches sector victim_age x_coordinate y_cordinate zip_code total_pop %_male %_female median_age 18_&_over 21_&_over 62_&_over 65_&_over %_white %_black %_native %_asian %_hispanic %_pop_over_16 %_employed %_unemployed mean_household_income %_families_poverty %_all_people_poverty year mnth day_of_year_number daily_crime_count
temp_in_F 1.00e+00 1.64e-01 5.55e-02 3.18e-03 2.30e-03 7.78e-03 9.11e-03 7.87e-03 1.06e-03 1.27e-02 1.36e-02 4.09e-03 5.54e-03 4.82e-03 1.39e-02 1.48e-02 1.26e-02 1.32e-02 0.01 9.67e-03 2.00e-03 1.57e-02 4.92e-03 1.10e-02 1.65e-02 8.63e-03 1.32e-02 4.14e-03 1.57e-01 1.56e-01 2.42e-01
humidity 1.64e-01 1.00e+00 3.67e-01 5.64e-03 8.69e-03 7.54e-05 1.63e-03 1.41e-03 3.13e-03 1.48e-02 8.19e-03 5.66e-03 5.76e-03 3.07e-03 2.12e-02 2.16e-02 6.22e-03 6.33e-03 0.03 3.29e-02 1.60e-03 1.31e-02 1.88e-02 7.40e-02 5.99e-02 2.62e-02 6.90e-03 1.50e-01 3.10e-02 2.98e-02 2.39e-02
percip_inches 5.55e-02 3.67e-01 1.00e+00 3.05e-03 9.31e-04 4.57e-03 4.95e-03 5.96e-03 4.57e-03 4.67e-03 9.41e-03 6.95e-03 3.60e-03 2.01e-03 6.31e-03 8.06e-03 5.93e-03 1.17e-02 0.01 2.49e-02 4.34e-03 3.71e-03 1.17e-02 2.83e-02 4.06e-02 2.34e-02 6.59e-04 3.36e-02 2.05e-02 2.15e-02 3.60e-04
sector 3.18e-03 5.64e-03 3.05e-03 1.00e+00 3.79e-02 4.41e-01 2.03e-01 2.70e-01 1.17e-01 2.76e-01 2.76e-01 2.58e-01 1.76e-01 1.80e-01 3.98e-01 4.10e-01 3.56e-01 3.52e-01 0.01 2.20e-01 7.69e-02 7.78e-02 3.90e-01 2.30e-01 3.29e-02 1.67e-01 1.23e-01 7.06e-04 9.22e-03 8.82e-03 9.85e-03
victim_age 2.30e-03 8.69e-03 9.31e-04 3.79e-02 1.00e+00 1.74e-02 2.86e-02 1.28e-02 1.99e-02 3.70e-02 3.63e-02 4.50e-02 3.20e-02 2.84e-02 6.07e-02 5.68e-02 2.67e-02 3.92e-02 0.01 4.76e-02 6.85e-03 3.06e-02 5.83e-02 5.59e-03 1.71e-03 2.50e-02 2.72e-02 1.39e-02 2.20e-03 1.76e-03 1.53e-02
x_coordinate 7.78e-03 7.54e-05 4.57e-03 4.41e-01 1.74e-02 1.00e+00 1.72e-02 1.86e-02 4.36e-01 1.13e-01 1.13e-01 2.08e-01 2.59e-01 2.25e-01 9.68e-02 8.82e-02 2.77e-02 1.35e-02 0.05 9.59e-02 1.38e-01 5.58e-02 5.79e-02 1.00e-01 1.95e-01 2.11e-01 2.05e-01 1.70e-02 1.50e-02 1.45e-02 1.63e-04
y_cordinate 9.11e-03 1.63e-03 4.95e-03 2.03e-01 2.86e-02 1.72e-02 1.00e+00 3.97e-01 1.20e-01 2.49e-01 2.52e-01 4.91e-01 4.08e-01 4.36e-01 3.48e-01 3.51e-01 4.28e-01 5.69e-01 0.12 7.30e-01 2.92e-01 1.35e-01 5.06e-01 4.10e-01 4.85e-01 5.68e-01 6.42e-01 2.75e-02 6.24e-03 6.15e-03 3.33e-03
zip_code 7.87e-03 1.41e-03 5.96e-03 2.70e-01 1.28e-02 1.86e-02 3.97e-01 1.00e+00 1.48e-01 1.96e-01 1.94e-01 2.85e-01 1.07e-03 1.47e-02 3.18e-01 3.59e-01 1.84e-01 4.50e-02 0.21 3.24e-01 2.54e-01 1.73e-01 6.53e-02 2.64e-02 3.92e-02 1.85e-01 3.02e-01 2.09e-02 2.89e-04 1.12e-04 7.88e-03
total_pop 1.06e-03 3.13e-03 4.57e-03 1.17e-01 1.99e-02 4.36e-01 1.20e-01 1.48e-01 1.00e+00 1.46e-01 1.44e-01 4.78e-01 5.46e-01 5.53e-01 2.15e-01 2.26e-01 4.97e-02 8.61e-02 0.09 1.46e-01 5.21e-01 7.82e-02 8.49e-02 3.55e-02 2.11e-01 3.05e-01 2.28e-01 4.18e-02 1.33e-02 1.28e-02 4.99e-03
%_male 1.27e-02 1.48e-02 4.67e-03 2.76e-01 3.70e-02 1.13e-01 2.49e-01 1.96e-01 1.46e-01 1.00e+00 9.98e-01 3.12e-02 5.69e-01 5.46e-01 4.12e-01 4.29e-01 3.97e-01 4.30e-01 0.48 2.59e-01 1.60e-02 5.51e-01 1.03e-01 4.02e-01 2.14e-01 3.57e-01 2.50e-01 4.13e-02 4.10e-03 4.04e-03 3.46e-03
%_female 1.36e-02 8.19e-03 9.41e-03 2.76e-01 3.63e-02 1.13e-01 2.52e-01 1.94e-01 1.44e-01 9.98e-01 1.00e+00 3.25e-02 5.65e-01 5.45e-01 4.10e-01 4.28e-01 4.00e-01 4.37e-01 0.47 2.84e-01 1.61e-02 5.45e-01 1.14e-01 4.36e-01 2.55e-01 3.33e-01 2.50e-01 1.32e-02 1.20e-03 1.25e-03 6.80e-03
median_age 4.09e-03 5.66e-03 6.95e-03 2.58e-01 4.50e-02 2.08e-01 4.91e-01 2.85e-01 4.78e-01 3.12e-02 3.25e-02 1.00e+00 5.11e-01 5.43e-01 8.37e-01 8.22e-01 3.40e-02 3.50e-02 0.21 3.87e-01 7.09e-01 2.80e-01 1.12e-01 2.54e-01 4.86e-01 4.71e-01 4.43e-01 4.31e-04 2.50e-03 2.36e-03 8.79e-03
18_&_over 5.54e-03 5.76e-03 3.60e-03 1.76e-01 3.20e-02 2.59e-01 4.08e-01 1.07e-03 5.46e-01 5.69e-01 5.65e-01 5.11e-01 1.00e+00 9.94e-01 8.62e-02 7.89e-02 1.82e-01 2.16e-01 0.02 5.54e-01 6.89e-01 8.53e-02 4.13e-01 4.26e-01 4.12e-01 6.09e-01 4.81e-01 4.91e-02 8.24e-03 8.13e-03 9.10e-03
21_&_over 4.82e-03 3.07e-03 2.01e-03 1.80e-01 2.84e-02 2.25e-01 4.36e-01 1.47e-02 5.53e-01 5.46e-01 5.45e-01 5.43e-01 9.94e-01 1.00e+00 1.22e-01 1.14e-01 1.92e-01 2.30e-01 0.02 5.74e-01 6.99e-01 4.09e-02 4.41e-01 4.48e-01 4.38e-01 6.11e-01 4.96e-01 2.73e-02 6.35e-03 6.28e-03 4.93e-03
62_&_over 1.39e-02 2.12e-02 6.31e-03 3.98e-01 6.07e-02 9.68e-02 3.48e-01 3.18e-01 2.15e-01 4.12e-01 4.10e-01 8.37e-01 8.62e-02 1.22e-01 1.00e+00 9.92e-01 8.31e-02 8.46e-02 0.40 2.15e-01 4.59e-01 4.43e-01 5.12e-02 6.31e-02 3.11e-01 1.60e-01 2.20e-01 7.19e-02 1.86e-03 1.75e-03 3.83e-03
65_&_over 1.48e-02 2.16e-02 8.06e-03 4.10e-01 5.68e-02 8.82e-02 3.51e-01 3.59e-01 2.26e-01 4.29e-01 4.28e-01 8.22e-01 7.89e-02 1.14e-01 9.92e-01 1.00e+00 1.18e-01 1.13e-01 0.38 2.11e-01 4.68e-01 4.38e-01 6.80e-02 4.10e-02 2.76e-01 1.43e-01 2.02e-01 6.51e-02 1.48e-03 1.44e-03 8.19e-03
%_white 1.26e-02 6.22e-03 5.93e-03 3.56e-01 2.67e-02 2.77e-02 4.28e-01 1.84e-01 4.97e-02 3.97e-01 4.00e-01 3.40e-02 1.82e-01 1.92e-01 8.31e-02 1.18e-01 1.00e+00 9.46e-01 0.10 3.59e-01 3.09e-01 8.15e-03 5.86e-01 5.33e-01 5.32e-01 6.16e-01 6.84e-01 6.53e-02 4.96e-03 4.84e-03 5.44e-03
%_black 1.32e-02 6.33e-03 1.17e-02 3.52e-01 3.92e-02 1.35e-02 5.69e-01 4.50e-02 8.61e-02 4.30e-01 4.37e-01 3.50e-02 2.16e-01 2.30e-01 8.46e-02 1.13e-01 9.46e-01 1.00e+00 0.08 5.11e-01 2.86e-01 3.45e-02 6.91e-01 5.38e-01 5.60e-01 6.34e-01 7.45e-01 8.00e-02 8.82e-03 8.63e-03 1.71e-02
%_native 1.44e-02 3.28e-02 1.35e-02 1.23e-02 1.42e-02 4.82e-02 1.19e-01 2.12e-01 8.98e-02 4.81e-01 4.65e-01 2.15e-01 1.65e-02 2.08e-02 3.97e-01 3.76e-01 9.71e-02 8.25e-02 1.00 2.01e-01 1.94e-01 7.36e-01 2.61e-01 2.34e-02 1.90e-01 1.41e-01 5.70e-02 2.07e-01 1.39e-02 1.32e-02 3.72e-02
%_asian 9.67e-03 3.29e-02 2.49e-02 2.20e-01 4.76e-02 9.59e-02 7.30e-01 3.24e-01 1.46e-01 2.59e-01 2.84e-01 3.87e-01 5.54e-01 5.74e-01 2.15e-01 2.11e-01 3.59e-01 5.11e-01 0.20 1.00e+00 4.29e-01 9.69e-02 6.90e-01 5.94e-01 6.58e-01 4.82e-01 6.26e-01 2.91e-01 1.54e-02 1.45e-02 4.77e-02
%_hispanic 2.00e-03 1.60e-03 4.34e-03 7.69e-02 6.85e-03 1.38e-01 2.92e-01 2.54e-01 5.21e-01 1.60e-02 1.61e-02 7.09e-01 6.89e-01 6.99e-01 4.59e-01 4.68e-01 3.09e-01 2.86e-01 0.19 4.29e-01 1.00e+00 1.45e-01 1.41e-01 2.08e-01 3.33e-01 3.22e-01 2.58e-01 9.72e-03 3.41e-03 3.34e-03 6.52e-03
%_pop_over_16 1.57e-02 1.31e-02 3.71e-03 7.78e-02 3.06e-02 5.58e-02 1.35e-01 1.73e-01 7.82e-02 5.51e-01 5.45e-01 2.80e-01 8.53e-02 4.09e-02 4.43e-01 4.38e-01 8.15e-03 3.45e-02 0.74 9.69e-02 1.45e-01 1.00e+00 2.53e-01 1.49e-01 1.07e-01 7.03e-02 2.65e-02 5.46e-02 1.25e-02 1.24e-02 2.21e-02
%_employed 4.92e-03 1.88e-02 1.17e-02 3.90e-01 5.83e-02 5.79e-02 5.06e-01 6.53e-02 8.49e-02 1.03e-01 1.14e-01 1.12e-01 4.13e-01 4.41e-01 5.12e-02 6.80e-02 5.86e-01 6.91e-01 0.26 6.90e-01 1.41e-01 2.53e-01 1.00e+00 4.55e-01 5.26e-01 6.35e-01 7.45e-01 1.55e-01 7.88e-03 7.67e-03 2.46e-02
%_unemployed 1.10e-02 7.40e-02 2.83e-02 2.30e-01 5.59e-03 1.00e-01 4.10e-01 2.64e-02 3.55e-02 4.02e-01 4.36e-01 2.54e-01 4.26e-01 4.48e-01 6.31e-02 4.10e-02 5.33e-01 5.38e-01 0.02 5.94e-01 2.08e-01 1.49e-01 4.55e-01 1.00e+00 7.54e-01 3.90e-01 5.44e-01 5.56e-01 1.67e-02 1.57e-02 4.88e-02
mean_household_income 1.65e-02 5.99e-02 4.06e-02 3.29e-02 1.71e-03 1.95e-01 4.85e-01 3.92e-02 2.11e-01 2.14e-01 2.55e-01 4.86e-01 4.12e-01 4.38e-01 3.11e-01 2.76e-01 5.32e-01 5.60e-01 0.19 6.58e-01 3.33e-01 1.07e-01 5.26e-01 7.54e-01 1.00e+00 4.38e-01 6.71e-01 4.90e-01 2.31e-02 2.23e-02 8.07e-02
%_families_poverty 8.63e-03 2.62e-02 2.34e-02 1.67e-01 2.50e-02 2.11e-01 5.68e-01 1.85e-01 3.05e-01 3.57e-01 3.33e-01 4.71e-01 6.09e-01 6.11e-01 1.60e-01 1.43e-01 6.16e-01 6.34e-01 0.14 4.82e-01 3.22e-01 7.03e-02 6.35e-01 3.90e-01 4.38e-01 1.00e+00 9.03e-01 2.15e-01 1.79e-02 1.71e-02 6.00e-02
%_all_people_poverty 1.32e-02 6.90e-03 6.59e-04 1.23e-01 2.72e-02 2.05e-01 6.42e-01 3.02e-01 2.28e-01 2.50e-01 2.50e-01 4.43e-01 4.81e-01 4.96e-01 2.20e-01 2.02e-01 6.84e-01 7.45e-01 0.06 6.26e-01 2.58e-01 2.65e-02 7.45e-01 5.44e-01 6.71e-01 9.03e-01 1.00e+00 5.05e-02 2.57e-03 2.32e-03 9.65e-03
year 4.14e-03 1.50e-01 3.36e-02 7.06e-04 1.39e-02 1.70e-02 2.75e-02 2.09e-02 4.18e-02 4.13e-02 1.32e-02 4.31e-04 4.91e-02 2.73e-02 7.19e-02 6.51e-02 6.53e-02 8.00e-02 0.21 2.91e-01 9.72e-03 5.46e-02 1.55e-01 5.56e-01 4.90e-01 2.15e-01 5.05e-02 1.00e+00 2.14e-02 2.00e-02 7.45e-02
mnth 1.57e-01 3.10e-02 2.05e-02 9.22e-03 2.20e-03 1.50e-02 6.24e-03 2.89e-04 1.33e-02 4.10e-03 1.20e-03 2.50e-03 8.24e-03 6.35e-03 1.86e-03 1.48e-03 4.96e-03 8.82e-03 0.01 1.54e-02 3.41e-03 1.25e-02 7.88e-03 1.67e-02 2.31e-02 1.79e-02 2.57e-03 2.14e-02 1.00e+00 9.96e-01 1.90e-01
day_of_year_number 1.56e-01 2.98e-02 2.15e-02 8.82e-03 1.76e-03 1.45e-02 6.15e-03 1.12e-04 1.28e-02 4.04e-03 1.25e-03 2.36e-03 8.13e-03 6.28e-03 1.75e-03 1.44e-03 4.84e-03 8.63e-03 0.01 1.45e-02 3.34e-03 1.24e-02 7.67e-03 1.57e-02 2.23e-02 1.71e-02 2.32e-03 2.00e-02 9.96e-01 1.00e+00 1.86e-01
daily_crime_count 2.42e-01 2.39e-02 3.60e-04 9.85e-03 1.53e-02 1.63e-04 3.33e-03 7.88e-03 4.99e-03 3.46e-03 6.80e-03 8.79e-03 9.10e-03 4.93e-03 3.83e-03 8.19e-03 5.44e-03 1.71e-02 0.04 4.77e-02 6.52e-03 2.21e-02 2.46e-02 4.88e-02 8.07e-02 6.00e-02 9.65e-03 7.45e-02 1.90e-01 1.86e-01 1.00e+00
In [275]:
# What do our correlations look like visually?
plt.figure(figsize = (20,15))
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns);

AWESOME! Our two main features we hypothesized were important to each other, daily crime count and temperature, do show a correlation! Even though, in a "perfect situation", a 20% correlation between variables doesn't seem too impresive since we are using real-life data this is a great sign that our two features are actually connected! Since we've looked at our connection between the two let's quickly take a look at what each features looks like seperately.

In [276]:
# What is the distribution of our daily crime counts?
plt.figure(figsize = (20,10))
sns.distplot(df['daily_crime_count'])
plt.title('Distribution of Daily Crime Reports')
plt.xlabel('Daily Crime Count')
plt.ylabel('Occurrence');
In [277]:
# What is the distribution of our temperature?
plt.figure(figsize = (20, 10))
sns.distplot(df['temp_in_F'])
plt.title('Distribution of Temperature (F)')
plt.xlabel('Temperature in Farenheit')
plt.ylabel('Occurrence');

NICE! Both of our chosen features have a pretty regular distribution and no crazy outliers which means we can move on to the next step of our analysis.

In addition to our two main features we are focusing on for our modeling there were also several other correlations that stood out, though not enough to reach our threshold for modeling. Let's go ahead and visualize these relationships below!

In [278]:
# What does the relationship between crime counts and % of Black/African American residents look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_black'], y = df['daily_crime_count'], data = df, color = 'red')
plt.title('Percentage of Black/African American Residents vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('% of Black/African American Residents')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship? There is a ton of activity going on around the 10- 30% area of populations, meaning that crime counts increase where the Black/African American population is between 10-30%. We can also see some interesting spikes and valleys throughout this data but nothing is consistant enough to warrant a higher correlation.

In [279]:
# What does the relationship between crime counts and % of Native residents look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_native'], y = df['daily_crime_count'], data = df, color = 'orange')
plt.title('Percentage of Native American Residents vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('% of Native American Residents')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

We see a ton of activity in the first part of this chart, which makes sense since there is a very small Native American population in Dallas. This graph is telling us that crime count is pretty high in smaller populations of Native Americans in Dallas.

In [280]:
# What does the relationship between crime counts and % of Asian residents look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_asian'], y = df['daily_crime_count'], data = df, color = 'green')
plt.title('Percentage of Asian Residents vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('% of Asian')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

According to our data it also appears that Dallas has a relatively small percentage of the population that is Asian in specific areas. The interesting thing on this chart is the large downward spike around 4% implying that crime counts drastically lower in areas with an Asian population of 4%.

In [281]:
# # What does the relationship between crime counts and % of population over 16 years old look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_pop_over_16'], y = df['daily_crime_count'], data = df, color = 'blue')
plt.title('Percentage of Population Over 16 vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('Percentage of Population Over 16')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

We see that since there is a ton of movement in the first part of this graph that crime counts and population over 16 are most closely related when the population over 16 is smaller.

In [282]:
# What does the relationship between crime counts and % of population employed look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_employed'], y = df['daily_crime_count'], data = df, color = 'purple')
plt.title('Percentage of Population Employed vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('Percentage of Population Employed')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

In populations where 45- 75% of the population are employed the greatest activity in crime count occurs. This connection is weaker the further out you go from this group in either direction.

In [283]:
# What does the relationship between crime counts and % of the population that is unemployed look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_unemployed'], y = df['daily_crime_count'], data = df, color = 'red')
plt.title('Percentage of Population Unemployed vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('Percentage of Population Unemployed')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

Crime counts increase in populations with more than 3% of the population experiencing unemployment but then levels out when a larger (8%) portion of the population is unemployed.

In [284]:
# What does the relationship between crime counts and mean household income look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['mean_household_income'], y = df['daily_crime_count'], data = df, color = 'orange')
plt.title('Mean Household Income vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('Mean Household Income')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

Households with a mean income of between 100,000 to 200,000 USD experience the most drastic activity in relation to crime count, while the amounts of crimes reported seems to fall after a household mean income of 250,000 USD.

In [285]:
# What does the relationship between crime counts and % of families in poverty look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['%_families_poverty'], y = df['daily_crime_count'], data = df, color = 'yellow')
plt.title('Percentage of Families in Poverty vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('Percentage of Families in Poverty')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

This activity has almost a linear relationship to it. We can definately see an upward trend that as the percentages of families in poverty increase in an area the number of reported crimes increase as well.

In [286]:
# What does the relationship between crime counts and year look like?
plt.figure(figsize = (20,15))
sns.lineplot(x= df['year'], y = df['daily_crime_count'], data = df, color = 'blue')
plt.title('Year vs. Crime Count in Dallas, Texas 2015-2018')
plt.xlabel('Year')
plt.ylabel('Daily Crime Count');

What does this tells us about this relationship?

We can see that the amount of reported crimes took a HUGE drop at the beginning of 2017 but began to increase shortly after.

1.3: Using Linear Regression and Ordinary Least Squares(OLS) to Validate Our Features

Back to Outline

As we discussed earlier, in real-life situations we won't have the time to do a full ARIMA model on all data we are given, and sometimes the data we will be asked to look at won't actually have any real value to our business. With that in mind one quick way to see how important our features are to modeling is to use the OLS measures on each feature individually before looking at their combined scores. Since we only have two features here, daily crime count and temperature, that we are basing our model on this is a pretty short process.

In [287]:
import statsmodels.api as sm

X = df['temp_in_F']
Y = df['daily_crime_count']

# Looking at OLS for temp vs. target
X = sm.add_constant(X)

results = sm.OLS(Y, X).fit()

results.summary()
C:\Users\gothv\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py:2389: FutureWarning:

Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.

Out[287]:
OLS Regression Results
Dep. Variable: daily_crime_count R-squared: 0.058
Model: OLS Adj. R-squared: 0.058
Method: Least Squares F-statistic: 5272.
Date: Mon, 18 Nov 2019 Prob (F-statistic): 0.00
Time: 18:04:59 Log-Likelihood: -3.3282e+05
No. Observations: 85066 AIC: 6.656e+05
Df Residuals: 85064 BIC: 6.657e+05
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 47.6462 0.188 253.828 0.000 47.278 48.014
temp_in_F 0.1916 0.003 72.609 0.000 0.186 0.197
Omnibus: 2227.376 Durbin-Watson: 0.022
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2410.379
Skew: 0.402 Prob(JB): 0.00
Kurtosis: 3.186 Cond. No. 322.


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

We know that there are several important statistics on the chart above that can tell us whether our data is worth continuing with or not. First, we have our R Squared and Adjusted R Squared scores. Our R Squared score tells us how well our model explains the variance in our dataset. We only two feature we are considering, so our R Squared score is pretty low as our information only really relates to each other and doesn't really explain EVERYTHING that is happening. (This is most likely a result of adding in a large number of features and the fact that temperature doesn't really correlate to any other feature.) Another additional feature that is important to whether our model will be work out time is the P Value. Here we don't have a p value, but, once again, this may be due to the size of the dataset and the un-correlated-ness of our other data. We can simplify our dataframe and try this test again, just to see if there is a noticable difference.

In [288]:
# Simplfying our data below
temp_crime_only = df[['date_only','daily_crime_count', 'temp_in_F']]
temp_crime_only = temp_crime_only.groupby('date_only').mean()
temp_crime_only.head()
Out[288]:
daily_crime_count temp_in_F
date_only
2015-01-01 81 34.35
2015-01-02 52 38.73
2015-01-03 67 43.90
2015-01-04 47 34.73
2015-01-05 38 35.69
In [289]:
# Running our OLS on our reduced data
X = temp_crime_only['temp_in_F']
Y = temp_crime_only['daily_crime_count']

# Looking at OLS for temp vs. target
X = sm.add_constant(X)

results = sm.OLS(Y, X).fit()

results.summary()
Out[289]:
OLS Regression Results
Dep. Variable: daily_crime_count R-squared: 0.073
Model: OLS Adj. R-squared: 0.072
Method: Least Squares F-statistic: 114.9
Date: Mon, 18 Nov 2019 Prob (F-statistic): 7.38e-26
Time: 18:04:59 Log-Likelihood: -5716.0
No. Observations: 1461 AIC: 1.144e+04
Df Residuals: 1459 BIC: 1.145e+04
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 43.6731 1.394 31.329 0.000 40.939 46.408
temp_in_F 0.2126 0.020 10.719 0.000 0.174 0.251
Omnibus: 32.864 Durbin-Watson: 1.294
Prob(Omnibus): 0.000 Jarque-Bera (JB): 47.598
Skew: 0.232 Prob(JB): 4.62e-11
Kurtosis: 3.753 Cond. No. 309.


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Our R Squared score has increased as a result of simplifying our dataset, but we are still without a P Value. For the purposes of this project we will keep our features as is and move on to setting up our dataframes for modeling.

2. Model Dataframes Set-Up

Back to Outline

Research question for our project: Can we use past records of crimes and the corresponding weather for the day of the crime to help predict future crimes?

Since our research question focuses mainly on number of crimes per day and weather let's simplify our testing datasets to reflect these features. Since time series data is best analyzed as a univariate dataset we will have to create a few sub-datasets focusing on the individual features we want to treat as time series data, work with them individually, and then compare them to our other time series data to draw our final conclusions. Let's get started!

2.1: Date vs. Crime Count

Back to Outline

In [290]:
# Since we've already created a daily crime count sub-df let's start there
daily_count.head()
Out[290]:
date_only daily_crime_count
1402 2015-01-01 81
484 2015-01-02 52
1128 2015-01-03 67
248 2015-01-04 47
47 2015-01-05 38
In [291]:
# Changing the index of this df to date_only
daily_count.set_index('date_only', inplace = True)
In [292]:
# Sanity check: did our index actually reset?
daily_count.head()
Out[292]:
daily_crime_count
date_only
2015-01-01 81
2015-01-02 52
2015-01-03 67
2015-01-04 47
2015-01-05 38
In [293]:
# Let's take a quick look at this sub-df and it's details
daily_count.describe()
Out[293]:
daily_crime_count
count 1461.00
mean 58.22
std 12.57
min 1.00
25% 50.00
50% 57.00
75% 66.00
max 113.00

Now that we have our daily crime count and date as a seperate data frame we can move on to our date vs. temperature data frame.

2.2: Date vs. Temperature

Back to Outline

In [294]:
# Creating our initial data frame for date and temperature
temp_df = df[['date_only', 'temp_in_F']].copy()
In [295]:
# Checking that it looks ok
temp_df.head()
Out[295]:
date_only temp_in_F
0 2015-01-01 34.35
1 2015-01-01 34.35
2 2015-01-01 34.35
3 2015-01-01 34.35
4 2015-01-01 34.35
In [296]:
# Smushing these down to have one date and one temp each
temp_df = temp_df.groupby('date_only').mean()
In [297]:
# Sanity check: did the smushing work correctly?
temp_df.head()
Out[297]:
temp_in_F
date_only
2015-01-01 34.35
2015-01-02 38.73
2015-01-03 43.90
2015-01-04 34.73
2015-01-05 35.69
In [298]:
# Let's look at the overall info for this dataset
temp_df.describe()
Out[298]:
temp_in_F
count 1461.00
mean 68.46
std 15.98
min 22.41
25% 57.40
50% 70.30
75% 82.59
max 97.53

Excellent! We now have nice, neat sub-dataframes that we can now use to do our time series analysis modeling with! Next step: decomposing these!

3. Time Series Decomposition

Back to Outline

Although, at first glance, time series data appears to just be one set of data there are really four main components that must be dealt with seperately in order to process time series data correctly. These four parts are: trend, seasonality, noise/error, and base line data. Below we will decompose our time series datasets into their respective parts in order to better understand what is really happening in our data!

3.1: Decomposing Our Date vs. Crime Count Data

Back to Outline

In [299]:
# Plotting our dataset to see what it looks like
plt.figure(figsize = (20,10))
plt.plot(daily_count)
plt.title('Date vs. Crime Count: Raw Data')
plt.xlabel('Date')
plt.ylabel('Daily Crime Count');

Looking at the visualization above we can see that our data appears to have some seasonality (there are repeated spikes and valleys in the line), there is some sort of trend (our line almost has a "typical" wave look to it- it follows several up and down curves), and there seems to be some weird data happening around the beginning of 2018. Let's first take a closer look at that weird data before we decompose this data.

In [300]:
# What days did we have less than 20 crime reports?
daily_count.loc[daily_count['daily_crime_count']< 20]
Out[300]:
daily_crime_count
date_only
2018-02-05 18
2018-02-06 4
2018-02-11 15
2018-02-12 3
2018-02-13 1
2018-02-14 3

Cognitively, we know that it is very unlikely that the Dallas Police Department, who serves a population of 1.3 million people daily, had several days with less than 20 reported crimes, so let's go ahead and drop these days.

In [301]:
daily_count= daily_count.loc[daily_count['daily_crime_count']> 20]
In [302]:
# Double checking, visually, that our data looks more regular now
plt.figure(figsize = (20,10))
plt.plot(daily_count)
plt.title('Date vs. Crime Count: Raw Data')
plt.xlabel('Date')
plt.ylabel('Daily Crime Count');

That looks so much better! Now that our data is more solid we can begin to seperate out the components of our time series.

In [303]:
# To do this we need to import the correct tools
from statsmodels.tsa.seasonal import seasonal_decompose
In [304]:
decomposition = seasonal_decompose(daily_count, freq = 365)
In [305]:
# Getting the values for our parts
trend_crime = decomposition.trend
seasonal_crime = decomposition.seasonal
noise_crime = decomposition.resid
observed_crime = decomposition.observed
In [306]:
# Plotting out what these parts look like visually

# Plotting original data
plt.figure(figsize= (15,20))
plt.subplot(411)
plt.plot(daily_count, label = 'Original Data')
plt.legend(loc = 'best')
plt.title('Original Date vs. Crime Count')

# Plotting Trend
plt.subplot(412)
plt.plot(trend_crime, 'g', label = 'Trend')
plt.legend(loc = 'best')
plt.title('Trend')

# Plotting Seasonality
plt.subplot(413)
plt.plot(seasonal_crime,'y',  label = 'Seasonality')
plt.legend(loc = 'best')
plt.title('Seasonality')

# Plotting Noise
plt.subplot(414)
plt.plot(noise_crime, 'r', label = 'Noise')
plt.legend(loc = 'best')
plt.title('Noise');

While it is SUPER COOL to see our data "exploded" like this, we still have a few more steps to do before we can actually model and forecast this bad boy. Let's check the stationarity (checking to make sure the rolling average and standard deviation don't change over time).

There are two main ways we can check the stationarity of our data: looking at our rolling statistics and preforming a "Dickey-Fuller" test. Since we are just baby data scientists and time lords let's go a head and try both out to see what happens!

In [307]:
# Determining our rolling statistics
rol_mean = daily_count.rolling(12).mean()
rol_std = daily_count.rolling(12).std()
In [308]:
# What do these look like visually?
plt.figure(figsize = (20, 15))
original = plt.plot(daily_count, color = 'blue', label = 'Original')
mean = plt.plot(rol_mean, color = 'green', label = 'Rolling Mean')
std = plt.plot(rol_std, color = 'purple', label = 'Rolling STD')
plt.legend(loc = 'best')
plt.title('Moving Mean and STD')
plt.show(block = False)
In [309]:
# Trying out the Dickey-Fuller test
from statsmodels.tsa.stattools import adfuller

# Preforming our Dickey-Fuller test
test = adfuller(daily_count['daily_crime_count'], autolag = 'AIC')
print(test)
(-4.09509656454179, 0.0009867583237109304, 22, 1432, {'1%': -3.4349247631306237, '5%': -2.8635604442944658, '10%': -2.5678456715029183}, 10696.979932260716)
In [310]:
# Translating Our Results
dfoutput = pd.Series(test[0:4], index = ['Test Statistic', 'p-value', '# Lags Used', 'Number of Observations Used'])

print('Results of Dickey-Fuller Test: ')

for key, value in test[4].items():
    dfoutput['Critical Value (%s)' %key] = value
    print(dfoutput, '\n')
Results of Dickey-Fuller Test: 
Test Statistic                -4.10e+00
p-value                        9.87e-04
# Lags Used                    2.20e+01
Number of Observations Used    1.43e+03
Critical Value (1%)           -3.43e+00
dtype: float64 

Test Statistic                -4.10e+00
p-value                        9.87e-04
# Lags Used                    2.20e+01
Number of Observations Used    1.43e+03
Critical Value (1%)           -3.43e+00
Critical Value (5%)           -2.86e+00
dtype: float64 

Test Statistic                -4.10e+00
p-value                        9.87e-04
# Lags Used                    2.20e+01
Number of Observations Used    1.43e+03
Critical Value (1%)           -3.43e+00
Critical Value (5%)           -2.86e+00
Critical Value (10%)          -2.57e+00
dtype: float64 

For a Dickey-Fuller test there are two main areas we pay attention to: test statistic and critical value. We know our time series is stationary if our test statistic is smaller than our critical value. In our test above, since our numbers are on the negative scale, we see that on our first iteration of the test that our critical value(-3.43) is larger (closer to zero = bigger) than our test statistic (-4.10) so we can tell right away that our date vs. crime count time series is stationary! Let's move on to our second time series!

3.2: Decomposing Our Date vs. Temperature Data

Back to Outline

In [311]:
# Plotting our dataset to see what it looks like
plt.figure(figsize = (20,10))
plt.plot(temp_df)
plt.title('Date vs. Temperature in F: Raw Data')
plt.xlabel('Temperature in Farenheit')
plt.ylabel('Daily Crime Count');

Overall this data doesn't look too bad! We have a weird outlier in 2016 but it pretty much looks like a similar pattern to other years so we can move on to decomposing this data!

In [312]:
# Setting up the decomposition
decomposition = seasonal_decompose(temp_df, freq = 365)
In [313]:
# Getting the values for our parts
trend_temp = decomposition.trend
seasonal_temp = decomposition.seasonal
noise_temp = decomposition.resid
In [314]:
# Plotting out what these parts look like visually


# Plotting original data
plt.figure(figsize= (15,20))
plt.subplot(411)
plt.plot(temp_df, label = 'Original Data')
plt.legend(loc = 'best')
plt.title('Original Date vs. Temperature')

# Plotting Trend
plt.subplot(412)
plt.plot(trend_temp, 'g', label = 'Trend')
plt.legend(loc = 'best')
plt.title('Trend')

# Plotting Seasonality
plt.subplot(413)
plt.plot(seasonal_temp,'y',  label = 'Seasonality')
plt.legend(loc = 'best')
plt.title('Seasonality')

# Plotting Noise
plt.subplot(414)
plt.plot(noise_temp, 'r', label = 'Noise')
plt.legend(loc = 'best')
plt.title('Noise');

Just like we did above we need to test if this data is stationary or not.

In [315]:
# Determining our rolling statistics
rol_mean = temp_df.rolling(12).mean()
rol_std = temp_df.rolling(12).std()
In [316]:
# What do these look like visually?
plt.figure(figsize = (20, 15))
original = plt.plot(temp_df, color = 'blue', label = 'Original')
mean = plt.plot(rol_mean, color = 'green', label = 'Rolling Mean')
std = plt.plot(rol_std, color = 'purple', label = 'Rolling STD')
plt.legend(loc = 'best')
plt.title('Moving Mean and STD')
plt.show(block = False)
In [317]:
# Preforming our Dickey-Fuller test
test = adfuller(temp_df['temp_in_F'], autolag = 'AIC')

# Translating these results into an easier to read format
dfoutput = pd.Series(test[0:4], index = ['Test Statistic', 'p-value', '# Lags Used', 'Number of Observations Used'])

print('Results of Dickey-Fuller Test: ')

for key, value in test[4].items():
    dfoutput['Critical Value (%s)' %key] = value
    print(dfoutput, '\n')
Results of Dickey-Fuller Test: 
Test Statistic                   -2.73
p-value                           0.07
# Lags Used                      13.00
Number of Observations Used    1447.00
Critical Value (1%)              -3.43
dtype: float64 

Test Statistic                   -2.73
p-value                           0.07
# Lags Used                      13.00
Number of Observations Used    1447.00
Critical Value (1%)              -3.43
Critical Value (5%)              -2.86
dtype: float64 

Test Statistic                   -2.73
p-value                           0.07
# Lags Used                      13.00
Number of Observations Used    1447.00
Critical Value (1%)              -3.43
Critical Value (5%)              -2.86
Critical Value (10%)             -2.57
dtype: float64 

As stated above our goal, to prove that our time series is stationary, is to have a test statistic that is smaller than our critical value. Here we see that our critival value at 10% (-2.57) is larger than our test statistic (-2.73) so we know that our time series is stationary! Since both of our time series datasets are stationary we can use the data as it is and finally move on to the modeling portion of our time series analysis.

4. Modeling Our Data, Oh My!

Back to Outline

Time series analysis, in most forms, is a univariate analysis. Since we are using the AMAZING Prophet package created by Facebook to help simplify and streamline Auto-Recursion modeling in time series data we are able to preform both uni and multivariate models on our data. Let's start off simply with our univariate models.

Before we can do anything else let's go ahead and make the new sub-data frames we will use for our models!

In [318]:
# Editing our daily count info for prophet
daily_count.reset_index(inplace = True)
daily_count.rename(columns = {'date_only':'ds', 'daily_crime_count':'y'}, inplace = True)
daily_count.head()
Out[318]:
ds y
0 2015-01-01 81
1 2015-01-02 52
2 2015-01-03 67
3 2015-01-04 47
4 2015-01-05 38
In [319]:
# Setting up our temperature info to work with Prophet
temp_df.reset_index(inplace = True)
temp_df.rename(columns = {'date_only':'ds', 'temp_in_F':'y'}, inplace = True)
temp_df.head()
Out[319]:
ds y
0 2015-01-01 34.35
1 2015-01-02 38.73
2 2015-01-03 43.90
3 2015-01-04 34.73
4 2015-01-05 35.69

4.1: Importing 2019 Data and Creating New Sub-data Frames

Back to Outline

Since we already have real-time data for the majority of 2019 we can edit it now so we can compare it to our predicted values after modeling!

In [320]:
# Importing our 2019 data here
df2019 = pd.read_csv('df2019.csv')
In [321]:
# Checking to see what this data looks like
df2019.head()
Out[321]:
Unnamed: 0 911_call_type type_of_incident division sector council_district date_of_occurrence year_of_occurrence month_of_occurrence day1_of_the_week time_of_occurrence day_of_the_year victim_type victim_race victim_ethnicity victim_gender victim_age responding_officer_#1__badge_no responding_officer_#1__name responding_officer_#2_badge_no responding_officer_#2__name nibrs_crime_category x_coordinate y_cordinate zip_code date_only year month day_of_year_number month_year daily_crime_count temp_in_F humidity percip_inches
0 0 58 - ROUTINE INVESTIGATION ASSAULT -BODILY INJURY ONLY NORTHWEST 520.0 D2 01/01/2019 2019.0 January Tue 03:00 1.0 Individual Hispanic or Latino Hispanic or Latino Female 33.0 8890 MCDANIEL,TONYA,MARIE NaN NaN ASSAULT OFFENSES 2.47e+06 6.99e+06 75247.0 2019-01-01 2019 1 1 January-2019 268 39.8 77.34 0.0
1 1 40/01 - OTHER DOG BITE - INJURED PERSON NORTHEAST 220.0 D9 01/01/2019 2019.0 January Tue 12:00 1.0 Individual Hispanic or Latino Hispanic or Latino Male 41.0 9844 SMITH JR,GARY,DONALD NaN NaN MISCELLANEOUS 2.53e+06 6.99e+06 75228.0 2019-01-01 2019 1 1 January-2019 268 39.8 77.34 0.0
2 2 20 - ROBBERY ASSAULT -BODILY INJURY ONLY NORTHWEST 510.0 D2 01/01/2019 2019.0 January Tue 04:55 1.0 Individual Black Non-Hispanic or Latino Female 26.0 7180 TAYLOR,DEBORA,ANN NaN NaN ASSAULT OFFENSES 2.47e+06 6.99e+06 75247.0 2019-01-01 2019 1 1 January-2019 268 39.8 77.34 0.0
3 3 40/01 - OTHER UNEXPLAINED DEATH (NO OFFENSE) SOUTH CENTRAL 720.0 D4 01/01/2019 2019.0 January Tue 17:00 1.0 Individual Black Non-Hispanic or Latino Female 56.0 8708 LIGHTLE,ERIC,C 6753 MERENDA,MARK,J MISCELLANEOUS 2.48e+06 6.94e+06 75224.0 2019-01-01 2019 1 1 January-2019 268 39.8 77.34 0.0
4 4 31 - CRIMINAL MISCHIEF CRIM MISCHIEF > OR EQUAL $100 < $750 SOUTHEAST 340.0 D7 01/01/2019 2019.0 January Tue 19:00 1.0 Individual Black Non-Hispanic or Latino Male 30.0 11086 TAN,JADEN,HO NaN NaN DESTRUCTION/ DAMAGE/ VANDALISM OF PROPERTY 2.50e+06 6.96e+06 75215.0 2019-01-01 2019 1 1 January-2019 268 39.8 77.34 0.0
In [322]:
# Dropping any null values!
df2019.dropna(inplace = True)
In [323]:
# Creating our date vs. crime count subset
crime2019 = pd.DataFrame(df2019['date_only'].value_counts())
In [324]:
# Standardizing our new sub-df and taking a quick peek!
crime2019.rename(columns = {'date_only':'count'}, inplace = True)
crime2019.reset_index(inplace = True)
crime2019['index'] = pd.to_datetime(crime2019['index'])
crime2019.set_index('index', inplace = True)
crime2019.sort_index(inplace= True)
crime2019.head()
Out[324]:
count
index
2019-01-01 88
2019-01-02 63
2019-01-03 54
2019-01-04 63
2019-01-05 84
In [325]:
# Slightly modifying our 2019 crime count list so we can use it to merge with our total crimes
crime_2019 = crime2019.copy()
crime_2019.reset_index(inplace = True)
crime_2019.rename(columns = {'index':'ds', 'count':'y'}, inplace = True)
crime_2019['ds']= pd.to_datetime(crime_2019['ds'])
crime_2019.head()
Out[325]:
ds y
0 2019-01-01 88
1 2019-01-02 63
2 2019-01-03 54
3 2019-01-04 63
4 2019-01-05 84
In [326]:
# Adding our 2019 data to our total data
total_crime = pd.concat([daily_count, crime_2019], axis = 0)
In [327]:
# What does our new df look like?
total_crime.head()
Out[327]:
ds y
0 2015-01-01 81
1 2015-01-02 52
2 2015-01-03 67
3 2015-01-04 47
4 2015-01-05 38

Excellent! Now that we have our date vs. crime count data let's move on to our date vs. temperature data.

In [328]:
# Creating our date vs. temperature data frame for 2019
temp2019 = pd.DataFrame(df2019[['date_only', 'temp_in_F']])
In [329]:
# Merging our temps down by day & prepping for Prophet
temp2019 = temp2019.groupby('date_only').mean()
temp2019.reset_index(inplace= True)
temp2019['date_only'] = pd.to_datetime(temp2019['date_only'])
temp2019.set_index('date_only', inplace = True)
temp2019.head()
Out[329]:
temp_in_F
date_only
2019-01-01 39.80
2019-01-02 35.17
2019-01-03 37.36
2019-01-04 44.26
2019-01-05 49.33
In [330]:
# Making another sub-df to help us in merging with total temperature data
temp_2019 = temp2019.copy()
temp_2019.reset_index(inplace= True)
temp_2019.rename(columns = {'date_only':'ds', 'temp_in_F':'y'}, inplace= True)


#Converting our date to datetime
temp_2019['ds']= pd.to_datetime(temp_2019['ds'])
temp_2019.head()
Out[330]:
ds y
0 2019-01-01 39.80
1 2019-01-02 35.17
2 2019-01-03 37.36
3 2019-01-04 44.26
4 2019-01-05 49.33
In [331]:
# Creating a master temp df
total_temp = pd.concat([temp_df, temp_2019], axis = 0)
total_temp.tail()
Out[331]:
ds y
286 2019-10-25 48.75
287 2019-10-26 54.81
288 2019-10-27 60.83
289 2019-10-28 59.27
290 2019-10-29 48.05
In [332]:
#Adding our prior year dataframes together 
all_old = pd.merge(daily_count, temp_df, how='inner', on = 'ds')
all_old.rename(columns = {'y_x':'y', 'y_y':'temp'}, inplace = True)
all_old.head()
Out[332]:
ds y temp
0 2015-01-01 81 34.35
1 2015-01-02 52 38.73
2 2015-01-03 67 43.90
3 2015-01-04 47 34.73
4 2015-01-05 38 35.69
In [333]:
crime_2019.head()
Out[333]:
ds y
0 2019-01-01 88
1 2019-01-02 63
2 2019-01-03 54
3 2019-01-04 63
4 2019-01-05 84
In [334]:
# Adding all of our 2019 data together
all_new = pd.merge(crime_2019, temp_2019, how= 'inner', on = 'ds')
all_new.rename(columns = {'y_x':'y', 'y_y':'temp'}, inplace = True)
all_new.head()                          
Out[334]:
ds y temp
0 2019-01-01 88 39.80
1 2019-01-02 63 35.17
2 2019-01-03 54 37.36
3 2019-01-04 63 44.26
4 2019-01-05 84 49.33

Now that all of our data is neatly combined we can split our final data into test and training sets.

In [335]:
# Creating a split of crimes for testing and training - crime
total_crime['year'] = total_crime['ds'].dt.year
train_crime = total_crime.loc[total_crime['year'] <= 2017]
test_crime = total_crime.loc[total_crime['year']> 2017]
In [336]:
# Drop the years column and finish
train_crime.drop(columns= 'year', inplace = True)
test_crime.drop(columns = 'year', inplace = True)
total_crime.drop(columns = 'year', inplace = True)
C:\Users\gothv\Anaconda3\lib\site-packages\pandas\core\frame.py:4102: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [337]:
# Creating a split of crimes for testing and training - temp
total_temp['year'] = total_temp['ds'].dt.year
train_temp = total_temp.loc[total_temp['year'] <= 2017]
test_temp = total_temp.loc[total_temp['year']> 2017]
train_temp.drop(columns = 'year', inplace = True)
test_temp.drop(columns = 'year', inplace = True)
total_temp.drop(columns = 'year', inplace = True)

Hooray! We now have training and testing datasets for data! Let's move on to modeling!

4.2: Modeling Our Daily Crime Count Data and Forecasting

Back to Outline

Now that we have all of our data (and sub-datasets) set up we can begin with our univariate modeling. Since we are using the Prophet package to assist in our modeling we are limited with the amount of tuning and refining we can do in our modeling. Most of what we do to refine our models will be on the front end with how we set our data up, split our data, and what sub-sets to use when.

Important note:

Above we have two seperate sets of data for our different approaches to modeling. First, we have the actual collected data (daily counts from 2015 to 2018 and data from 2019 only) in two seperate databases. Additionally, we have a user-created split the overall data (combined data from 2015-20017 and combined data from 2018 to 2019). Due to the lack of options for model tuning we will try out both datasets and approaches to see which works best!

In [410]:
# Setting up our model in Prophet - Original Crime Data ASIS
model_crime = Prophet(daily_seasonality = True)
model_crime.fit(daily_count)
Out[410]:
<fbprophet.forecaster.Prophet at 0x1e31fcdb388>
In [415]:
# Fixing for model
crime_2019.reset_index(inplace = True)
In [416]:
# Forecasting with our actual 2019 data
crime_test_fcst = model_crime.predict(df=crime_2019)
In [419]:
# Plot the forecast
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
fig = model_crime.plot(crime_test_fcst, ax=ax)
plt.title('Actual and Predicted Crimes (2015 - 2019)- Actual Data')
plt.xlabel('Year')
plt.ylabel('Crime Counts');
In [341]:
# Plot the components (exploded view)
fig = model_crime.plot_components(crime_test_fcst)
plt.tight_layout()
In [342]:
# Run test statistics
mse_crime_a = mean_squared_error(y_true=crime_2019['y'],y_pred=crime_test_fcst['yhat'])
mae_crime_a = mean_absolute_error(y_true=crime_2019['y'],y_pred=crime_test_fcst['yhat'])

def mean_absolute_percentage_error(y_true, y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape_crime_a = mean_absolute_percentage_error(y_true=crime_2019['y'],y_pred=crime_test_fcst['yhat'])

We will review what these statistics mean to our models below!

In [343]:
# Setting up our model in Prophet - crime training
model_crime = Prophet(daily_seasonality = True)
model_crime.fit(train_crime)
Out[343]:
<fbprophet.forecaster.Prophet at 0x1e327d1ae88>
In [344]:
# Using our test data to make predictions
crime_test_fcstb = model_crime.predict(df=test_crime)
In [406]:
# Plot the forecast
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
fig = model_crime.plot(crime_test_fcstb, ax=ax)
plt.title('Actual and Predicted Crime Count (2015 - 2019)- Self Split')
plt.xlabel('Year')
plt.ylabel('Crime Count');
In [346]:
# What do the individual components of this forecast look like?
fig = model_crime.plot_components(crime_test_fcstb)
plt.tight_layout()

Let's now look at what our actual data looks like versus our predictions!

In [347]:
# Making a quick adjustment for modeling!
crime_for1 = crime_test_fcst.copy()
crime_for2 = crime_test_fcstb.copy()
crime_for2.set_index('ds', inplace = True)
crime_for1.set_index('ds', inplace = True)
In [348]:
# Another quick adjustment for modeling
crime_test = test_crime.copy()
crime_test.set_index('ds', inplace = True)
crime_test.head()
Out[348]:
y
ds
2018-01-01 69
2018-01-02 33
2018-01-03 45
2018-01-04 43
2018-01-05 60
In [349]:
# What does predicted 2018 & 2019 crimes look like vs. actual?
plt.figure(figsize= (20,10))

# Plotting our actual data
plt.subplot(211)
plt.plot(crime_test['y'], color= 'green')
plt.title('Actual Crime Report Counts in Dallas, Texas: 2018-2019')
plt.xlabel('Date')
plt.ylabel('Counts');

# Plotting predicted crimes
plt.subplot(212)
plt.plot(crime_for2['yhat'], color = 'red')
plt.title('Predicted Crime Report Counts in Dallas, Texas: 2018-2019')
plt.xlabel('Date')
plt.ylabel('Counts');
plt.tight_layout()
In [420]:
# What do predicted 2019 crimes look like vs. actual?
plt.figure(figsize= (20,10))

# Plotting our actual data
plt.subplot(211)
plt.plot(crime2019['count'], color= 'green')
plt.title('Actual Crime Report Counts in Dallas, Texas: 2019')
plt.xlabel('Date')
plt.ylabel('Counts');

# Plotting predicted crimes
plt.subplot(212)
plt.plot(crime_for1['yhat'], color = 'red')
plt.title('Predicted Crime Report Counts in Dallas, Texas: 2019')
plt.xlabel('Date')
plt.ylabel('Counts');
plt.tight_layout()
In [351]:
# Run test statistics
mse_crime_b = mean_squared_error(y_true= test_crime['y'],y_pred= crime_test_fcstb['yhat'])
mae_crime_b = mean_absolute_error(y_true= test_crime['y'],y_pred= crime_test_fcstb['yhat'])

def mean_absolute_percentage_error(y_true, y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape_crime_b = mean_absolute_percentage_error(y_true= test_crime['y'],y_pred=crime_test_fcstb['yhat'])
In [433]:
print('Crime Test Statistics (2019 Data Only):')
print('The mean squared error is: ', mse_crime_a)
print('The mean absolute error is: ', mae_crime_a)
print('The mean absolute percentage error is :', mape_crime_a)
print('\n')

print('Crime Test Statistics (2018-2019 Data):')
print('The mean squared error is: ', mse_crime_b)
print('The mean absolute error is: ', mae_crime_b)
print('The mean absolute percentage error is :', mape_crime_b)
Crime Test Statistics (2019 Data Only):
The mean squared error is:  231.6692239592333
The mean absolute error is:  12.231039459770427
The mean absolute percentage error is : 21.032825755333036


Crime Test Statistics (2018-2019 Data):
The mean squared error is:  587.8138164846321
The mean absolute error is:  20.292151386087223
The mean absolute percentage error is : 29.70720944739894

We can see several things in our test statistics. Let's start with our mean squared error. We know that the closer a mean squared error(MSE) is to zero the better the model is. We can see that our model with our real data divisions did the best since it has the lower MSE score. Our mean absolute percentage error (MAPE) is the measurement of how accurate our model is. As with MSE we want our numbers to be lower since this tells us that there is less error.

Based on our statistics above it would be best for us to continue to use our originally split data as our test and train sets and disregard our self-made split data. Let's move on to temperature!

4.3: Modeling Our Daily Temperature Data and Forecasting

Back to Outline

In [422]:
# Creating our model for temperature data
model_temp = Prophet(daily_seasonality = True)
model_temp.fit(temp_df)
Out[422]:
<fbprophet.forecaster.Prophet at 0x1e326260dc8>
In [424]:
# Temp. fix for model
temp_2019.reset_index(inplace = True)
In [425]:
# Making our ste of predictions using our actual 2019 data
temp_test_fcsta = model_temp.predict(df = temp_2019)
In [431]:
# What does this look like visually?
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
fig = model_temp.plot(temp_test_fcsta, ax=ax)
plt.title('Actual and Predicted Temperatures for Dallas, Texas (2015-2019)- 2019 Testing Only')
plt.xlabel('Year')
plt.ylabel('Temperature (F)');
In [356]:
# Plot the components- original data
fig = model_temp.plot_components(temp_test_fcsta)
plt.tight_layout()
In [357]:
# Test statistics, yo!
mse_temp_a = mean_squared_error(y_true= temp_2019['y'],y_pred= temp_test_fcsta['yhat'])
mae_temp_a = mean_absolute_error(y_true= temp_2019['y'],y_pred= temp_test_fcsta['yhat'])

def mean_absolute_percentage_error(y_true, y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape_temp_a = mean_absolute_percentage_error(y_true= temp_2019['y'],y_pred= temp_test_fcsta['yhat'])

Look below for our statistics and a breakdown of what they mean!

In [358]:
# Modeling our temperature data with our self-created split sets
model_temp = Prophet(daily_seasonality = True)
model_temp.fit(train_temp)
Out[358]:
<fbprophet.forecaster.Prophet at 0x1e327b2d448>
In [359]:
# Making another set of predictions using our test data
temp_test_fcstb = model_temp.predict(df = test_temp)
In [430]:
# Plot the forecast- Self made split data
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
fig = model_temp.plot(temp_test_fcstb, ax=ax)
plt.title('Actual and Predicted Temperature Dallas, Texas (2015-2019)- 2018-2019 Testing')
plt.xlabel('Date')
plt.ylabel('Temperature (F)')
Out[430]:
Text(98.625, 0.5, 'Temperature (F)')
In [361]:
# Plot the components- Self-Split data
fig = model_temp.plot_components(temp_test_fcstb)
plt.tight_layout()
In [362]:
# Making a quick adjustment
plot_test_temp = test_temp.copy()
plot_test_temp.set_index('ds', inplace = True)
plot_test_temp.head()
Out[362]:
y
ds
2018-01-01 22.41
2018-01-02 24.72
2018-01-03 29.80
2018-01-04 35.89
2018-01-05 42.94
In [363]:
# What does predicted 2018 & 2019 temperature look like vs. actual?
plt.figure(figsize= (20,10))

# Plotting our actual data
plt.subplot(211)
plt.plot(plot_test_temp['y'], color = 'green')
plt.title('Actual Temperature in Dallas, Texas: 2018-2019')
plt.xlabel('Date')
plt.ylabel('Temperature (F)');

# Plotting predicted temps
plt.subplot(212)
plt.plot(temp_test_fcstb['yhat'], color = 'red')
plt.title('Predicted Temperature in Dallas, Texas: 2018-2019')
plt.xlabel('Date')
plt.ylabel('Temperature (F)');
plt.tight_layout()
In [432]:
# What does predicted 2019 temperature look like vs. actual?
plt.figure(figsize= (20,10))

# Plotting our actual data
plt.subplot(211)
plt.plot(temp2019['temp_in_F'], color= 'green')
plt.title('Actual Temperature in Dallas, Texas: 2019')
plt.xlabel('Date')
plt.ylabel('Temperature (F)');

# Plotting predicted temps
plt.subplot(212)
plt.plot(temp_test_fcsta['yhat'], color = 'red')
plt.title('Predicted Temperature in Dallas, Texas: 2019')
plt.xlabel('Date')
plt.ylabel('Temperature (F)');
plt.tight_layout()
In [365]:
# Test statistics, yo!
mse_temp_b = mean_squared_error(y_true= test_temp['y'],y_pred= temp_test_fcstb['yhat'])
mae_temp_b = mean_absolute_error(y_true= test_temp['y'],y_pred= temp_test_fcstb['yhat'])

def mean_absolute_percentage_error(y_true, y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape_temp_b = mean_absolute_percentage_error(y_true= test_temp['y'],y_pred= temp_test_fcstb['yhat'])
In [434]:
# Test statistics go here

print('Test Statistics Temperature (2019 Data Only):')
print('The mean squared error is: ', mse_temp_a)
print('The mean absolute error is: ', mae_temp_a)
print('The mean absolute percentage error is :', mape_temp_a)

print('\n')
print('Test Statistics Temperature (2018-2019 Data):')
print('The mean squared error is: ', mse_temp_b)
print('The mean absolute error is: ', mae_temp_b)
print('The mean absolute percentage error is :', mape_temp_b)
Test Statistics Temperature (2019 Data Only):
The mean squared error is:  56.787589766994806
The mean absolute error is:  5.869535678982928
The mean absolute percentage error is : 9.702686359896587


Test Statistics Temperature (2018-2019 Data):
The mean squared error is:  73.82345311987564
The mean absolute error is:  7.1209167448680875
The mean absolute percentage error is : 11.70282262412885

Just as we saw with our crime predictions our error scores are lower when we use our original dataset and our additional data as a testing set. We will keep that model going forward!

4.4: Comparing Our Models and Actual Data

Back to Outline

Unlike other Auto-regressive models used in time series analysis , Prophet boasts the ability to add regressors, thus making it able to preform multivariate analysis on time series data. We will attempt to do just that below!

In [367]:
# Making a model with both features combined (since our original vs. new data was so successfull let's keep that)
m = Prophet(daily_seasonality = True)
m.add_regressor('temp')
m.fit(all_old)
Out[367]:
<fbprophet.forecaster.Prophet at 0x1e327a6d248>
In [368]:
# Making our predictions using our testing set
forecast_all = m.predict(df = all_new)
In [375]:
forecast_all.head()
Out[375]:
ds trend yhat_lower yhat_upper trend_lower trend_upper additive_terms additive_terms_lower additive_terms_upper daily daily_lower daily_upper extra_regressors_additive extra_regressors_additive_lower extra_regressors_additive_upper temp temp_lower temp_upper weekly weekly_lower weekly_upper yearly yearly_lower yearly_upper multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yhat
0 2019-01-01 85.52 45.91 70.98 85.52 85.52 -27.26 -27.26 -27.26 -18.26 -18.26 -18.26 -6.48 -6.48 -6.48 -6.48 -6.48 -6.48 -7.33 -7.33 -7.33 4.81 4.81 4.81 0.0 0.0 0.0 58.25
1 2019-01-02 85.55 46.66 71.15 85.55 85.55 -26.90 -26.90 -26.90 -18.26 -18.26 -18.26 -7.52 -7.52 -7.52 -7.52 -7.52 -7.52 -5.96 -5.96 -5.96 4.84 4.84 4.84 0.0 0.0 0.0 58.65
2 2019-01-03 85.59 49.57 73.31 85.59 85.59 -23.93 -23.93 -23.93 -18.26 -18.26 -18.26 -7.03 -7.03 -7.03 -7.03 -7.03 -7.03 -3.47 -3.47 -3.47 4.84 4.84 4.84 0.0 0.0 0.0 61.66
3 2019-01-04 85.62 57.85 83.02 85.62 85.62 -15.56 -15.56 -15.56 -18.26 -18.26 -18.26 -5.47 -5.47 -5.47 -5.47 -5.47 -5.47 3.38 3.38 3.38 4.80 4.80 4.80 0.0 0.0 0.0 70.07
4 2019-01-05 85.66 63.17 87.81 85.66 85.66 -10.39 -10.39 -10.39 -18.26 -18.26 -18.26 -4.33 -4.33 -4.33 -4.33 -4.33 -4.33 7.49 7.49 7.49 4.73 4.73 4.73 0.0 0.0 0.0 75.27
In [369]:
# Plot the components
fig = m.plot_components(forecast_all)
plt.title('Exploded Components: Forecast of All Features Together')
plt.tight_layout()
In [383]:
# Cleaning up for plotting
combo_forecast = forecast_all.copy()
combo_forecast.set_index('ds', inplace = True)
In [394]:
temp_2019.set_index('ds', inplace = True)
In [396]:
crime_2019.set_index('ds', inplace = True)
In [405]:
# Plotting our forecast vs. actuals
plt.figure(figsize = (20,10))
plt.plot(combo_forecast['yhat'], color= 'red', label = "Combined Forecast")
plt.plot(crime_2019['y'], color = 'green', label = 'Actual Crime Counts')
plt.plot(temp_2019['y'], color= 'blue', label = 'Actual Temperatures')
plt.legend(loc = 'best')
plt.title('Combined Forecast vs. Actual Crime Count and Temperature')
plt.xlabel('Date');
In [435]:
# Test statistics, yo!
mse_final = mean_squared_error(y_true= all_new['y'],y_pred= forecast_all['yhat'])
mae_final = mean_absolute_error(y_true= all_new['y'],y_pred= forecast_all['yhat'])
mape_final = mean_absolute_percentage_error(y_true= all_new['y'],y_pred= forecast_all['yhat'])

print('Test Statistics- Multivariate Model:')
print('The mean squared error is: ', mse_final)
print('The mean absolute error is: ', mae_final)
print('The mean absolute percentage error is :', mape_final)
Test Statistics- Multivariate Model:
The mean squared error is:  225.6226043993055
The mean absolute error is:  12.022216569253338
The mean absolute percentage error is : 20.63725903321737

Overall, our errors aren't too high, which is great! Unfortunately, without combining all data and splitting into testing groups based on our own criteria we aren't able to test this model again with other parameters and we really don't know how well our model preforms based on other models. Prophet has an additional capability to cross-validate models, so let's try that below to see if we can tune our model any further!

In [371]:
# Importing our needed tools
from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '365 days')
df_cv.head()
INFO:fbprophet:Making 3 forecasts with cutoffs between 2017-01-05 00:00:00 and 2017-12-31 00:00:00
Out[371]:
ds yhat yhat_lower yhat_upper y cutoff
0 2017-01-06 58.66 47.31 70.03 51 2017-01-05
1 2017-01-07 61.85 50.45 73.05 62 2017-01-05
2 2017-01-08 63.38 51.39 74.98 67 2017-01-05
3 2017-01-09 57.84 46.47 70.22 58 2017-01-05
4 2017-01-10 56.53 44.62 68.06 65 2017-01-05
In [372]:
# Let's look at preformance metrics for this 
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p.head()
Out[372]:
horizon mse rmse mae mape coverage
0 37 days 126.75 11.26 8.98 0.17 0.70
1 38 days 127.67 11.30 9.09 0.17 0.69
2 39 days 130.04 11.40 9.24 0.18 0.69
3 40 days 129.93 11.40 9.30 0.18 0.69
4 41 days 129.14 11.36 9.25 0.18 0.69
In [373]:
# What do statistics look like the further out you go in time?
df_p.tail()
Out[373]:
horizon mse rmse mae mape coverage
324 361 days 296.11 17.21 14.38 0.25 0.40
325 362 days 308.83 17.57 14.78 0.26 0.38
326 363 days 314.34 17.73 14.93 0.26 0.38
327 364 days 321.14 17.92 15.08 0.26 0.38
328 365 days 327.63 18.10 15.20 0.26 0.38
In [374]:
# Let's plot what this all looks like
from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mape')

Cross validation in Prophet is, thankfully, pretty much the same as any other cross validation. We use cross validation to help us tune our model and help it make more accurate predictions. In the image above we can see our MAPE for each individual prediction as a dot and the line shows our MAPE for the entire model. Our main takeaway from our cross validation is: the longer out in time we go for predictions the less accurate our model is. This is why our initial test models with split data were less accurate than our original data vs. brand new data.

5. Final Thoughts/Reflection

Back to Outline

Time series modeling is a mythical creature in world of data science. It is very difficult learn how to deal with this type of data becase very few resources exist that can explain how to manipulate and translate time series data without requiring a masters in statistics and econometrics. In the future I would like to continue to work on this project and refine my models as I learn more about this specialization.

Want to know what I did with this data? Check out the project page here